diff --git a/.circleci/config.yml b/.circleci/config.yml index 8a6ebfe7dfa..d9e2e5dc886 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -2,15 +2,29 @@ version: 2.0 jobs: build: docker: - - image: alantrrs/cuda-opencv:latest + - image: datamachines/cudnn_tensorflow_opencv:11.2.0_2.4.1_4.5.1-20210211 +# - image: alexeyab84/dockerfiles:latest +# - image: alantrrs/cuda-opencv:latest # - image: nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04 working_directory: ~/work steps: - checkout + - run: nvcc --version + - run: gcc --version + - run: export PATH=$PATH:/usr/local/include/opencv4/ + - run: export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/lib/:/usr/lib/:/usr/lib64/ - run: make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 -j 8 - run: make clean + - run: make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 DEBUG=1 -j 8 + - run: make clean + - run: make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 AVX=1 -j 8 + - run: make clean - run: make LIBSO=1 GPU=0 CUDNN=0 OPENCV=1 -j 8 - run: make clean + - run: make LIBSO=1 GPU=1 CUDNN=0 OPENCV=1 -j 8 + - run: make clean - run: make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 -j 8 - run: make clean - run: make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 -j 8 + - run: make clean + - run: make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 USE_CPP=1 -j 8 \ No newline at end of file diff --git a/.github/FUNDING.yml b/.github/FUNDING.yml new file mode 100644 index 00000000000..0c5ae2e2b8f --- /dev/null +++ b/.github/FUNDING.yml @@ -0,0 +1,12 @@ +# These are supported funding model platforms + +github: # Replace with up to 4 GitHub Sponsors-enabled usernames e.g., [user1, user2] +patreon: # Replace with a single Patreon username +open_collective: # Replace with a single Open Collective username +ko_fi: # Replace with a single Ko-fi username +tidelift: # Replace with a single Tidelift platform-name/package-name e.g., npm/babel +community_bridge: # Replace with a single Community Bridge project-name e.g., cloud-foundry +liberapay: # Replace with a single Liberapay username +issuehunt: # Replace with a single IssueHunt username +otechie: # Replace with a single Otechie username +custom: ['https://paypal.me/alexeyab84', 'https://blockchain.coinmarketcap.com/address/bitcoin/36La9T7DoLVMrUQzm6rBDGsxutyvDzbHnp', 'https://etherscan.io/address/0x193d56BE3C65e3Fb8f48c291B17C0702e211A588#', 'https://explorer.zcha.in/accounts/t1PzwJ28Prb7Nk8fgfT3RXCr6Xtw54tgjoy'] # Replace with up to 4 custom sponsorship URLs e.g., ['link1', 'link2'] diff --git a/.github/ISSUE_TEMPLATE/any-other-question-or-issue.md b/.github/ISSUE_TEMPLATE/any-other-question-or-issue.md new file mode 100644 index 00000000000..0904576f88a --- /dev/null +++ b/.github/ISSUE_TEMPLATE/any-other-question-or-issue.md @@ -0,0 +1,25 @@ +--- +name: Any other question or issue +about: Any other question or issue +title: '' +labels: '' +assignees: '' + +--- + +If something doesn’t work for you, then show 2 screenshots: +1. screenshots of your issue +2. screenshots with such information +``` +./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg + CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1 + CUDNN_HALF=1 + OpenCV version: 4.2.0 + 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 +net.optimized_memory = 0 +mini_batch = 1, batch = 8, time_steps = 1, train = 0 + layer filters size/strd(dil) input output + 0 conv 32 3 x 3/ 1 608 x 608 x 3 -> 608 x 608 x 32 0.639 BF +``` + +If you do not get an answer for a long time, try to find the answer among Issues with a Solved label: https://github.com/AlexeyAB/darknet/issues?q=is%3Aopen+is%3Aissue+label%3ASolved diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md new file mode 100644 index 00000000000..3204f84765f --- /dev/null +++ b/.github/ISSUE_TEMPLATE/bug-report.md @@ -0,0 +1,27 @@ +--- +name: Bug report +about: Create a report to help us improve +title: '' +labels: '' +assignees: '' + +--- + +If you want to report a bug - provide: + * description of a bug + * what command do you use? + * do you use Win/Linux/Mac? + * attach screenshot of a bug with previous messages in terminal + * in what cases a bug occurs, and in which not? + * if possible, specify date/commit of Darknet that works without this bug + * show such screenshot with info +``` +./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg + CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1 + CUDNN_HALF=1 + OpenCV version: 4.2.0 + 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 +net.optimized_memory = 0 +mini_batch = 1, batch = 8, time_steps = 1, train = 0 + layer filters size/strd(dil) input output +``` diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md new file mode 100644 index 00000000000..fd239774b79 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -0,0 +1,13 @@ +--- +name: Feature request +about: Suggest an idea for this project +title: '' +labels: Feature-request +assignees: '' + +--- + +For Feature-request: + * describe your feature as detailed as possible + * provide link to the paper and/or source code if it exist + * attach chart/table with comparison that shows improvement diff --git a/.github/ISSUE_TEMPLATE/training-issue---no-detections---nan-avg-loss---low-accuracy.md b/.github/ISSUE_TEMPLATE/training-issue---no-detections---nan-avg-loss---low-accuracy.md new file mode 100644 index 00000000000..b826cdd8c40 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/training-issue---no-detections---nan-avg-loss---low-accuracy.md @@ -0,0 +1,30 @@ +--- +name: Training issue - no-detections / Nan avg-loss / low accuracy +about: Training issue - no-detections / Nan avg-loss / low accuracy +title: '' +labels: Training issue +assignees: '' + +--- + +If you have an issue with training - no-detections / Nan avg-loss / low accuracy: + * read FAQ: https://github.com/AlexeyAB/darknet/wiki/FAQ---frequently-asked-questions + * what command do you use? + * what dataset do you use? + * what Loss and mAP did you get? + * show chart.png with Loss and mAP + * check your dataset - run training with flag `-show_imgs` i.e. `./darknet detector train ... -show_imgs` and look at the `aug_...jpg` images, do you see correct truth bounded boxes? + * rename your cfg-file to txt-file and drag-n-drop (attach) to your message here + * show content of generated files `bad.list` and `bad_label.list` if they exist + * Read `How to train (to detect your custom objects)` and `How to improve object detection` in the Readme: https://github.com/AlexeyAB/darknet/blob/master/README.md + * show such screenshot with info +``` +./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights data/dog.jpg + CUDA-version: 10000 (10000), cuDNN: 7.4.2, CUDNN_HALF=1, GPU count: 1 + CUDNN_HALF=1 + OpenCV version: 4.2.0 + 0 : compute_capability = 750, cudnn_half = 1, GPU: GeForce RTX 2070 +net.optimized_memory = 0 +mini_batch = 1, batch = 8, time_steps = 1, train = 0 + layer filters size/strd(dil) input output +``` diff --git a/.github/workflows/ccpp.yml b/.github/workflows/ccpp.yml new file mode 100644 index 00000000000..374a19f96f6 --- /dev/null +++ b/.github/workflows/ccpp.yml @@ -0,0 +1,700 @@ +name: Darknet Continuous Integration + +on: + push: + workflow_dispatch: + inputs: + debug_enabled: + type: boolean + description: 'Run the build with tmate debugging enabled (https://github.com/marketplace/actions/debugging-with-tmate)' + required: false + default: false + schedule: + - cron: '0 0 * * *' + +env: + VCPKG_BINARY_SOURCES: 'clear;nuget,vcpkgbinarycache,readwrite' + VCPKG_FORCE_DOWNLOADED_BINARIES: "TRUE" + +jobs: + ubuntu-makefile: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y libopencv-dev + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.sh + + - name: 'Create softlinks for CUDA' + run: | + source ${{ github.workspace }}/scripts/requested_cuda_version.sh + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so + + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=0' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 -j 8 + make clean + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 DEBUG=1' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 DEBUG=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 AVX=1' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 AVX=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=1' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=1 CUDNN=1 OPENCV=1' + run: | + export PATH=/usr/local/cuda/bin:$PATH + export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH + make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1' + run: | + export PATH=/usr/local/cuda/bin:$PATH + export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH + make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 USE_CPP=1' + run: | + export PATH=/usr/local/cuda/bin:$PATH + export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH + make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 USE_CPP=1 -j 8 + make clean + + + ubuntu-vcpkg-opencv4-cuda: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Setup tmate session + uses: mxschmitt/action-tmate@v3 + if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled }} + + - uses: lukka/get-cmake@latest + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + sudo apt-get install -y --no-install-recommends yasm nasm gperf automake autoconf libtool pkg-config autoconf-archive libx11-dev libxft-dev libxext-dev libxrandr-dev libxi-dev libxcursor-dev libxdamage-dev libxinerama-dev libdbus-1-dev libxtst-dev libltdl-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.sh + + - name: 'Create softlinks for CUDA' + run: | + source ${{ github.workspace }}/scripts/requested_cuda_version.sh + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Build' + shell: pwsh + env: + CUDACXX: "/usr/local/cuda/bin/nvcc" + CUDA_PATH: "/usr/local/cuda" + CUDA_TOOLKIT_ROOT_DIR: "/usr/local/cuda" + LD_LIBRARY_PATH: "/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -EnableCUDA -EnableCUDNN -DisableInteractive -DoNotUpdateTOOL -BuildInstaller + + - uses: actions/upload-artifact@v4 + with: + name: cfg + path: cfg + - uses: actions/upload-artifact@v4 + with: + name: data + path: data + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-cuda-${{ runner.os }}-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-cuda-${{ runner.os }}-uselib + path: ${{ github.workspace }}/uselib* + + + ubuntu-vcpkg-opencv3: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + sudo apt-get install -y --no-install-recommends yasm nasm gperf automake autoconf libtool pkg-config autoconf-archive libx11-dev libxft-dev libxext-dev libxrandr-dev libxi-dev libxcursor-dev libxdamage-dev libxinerama-dev libdbus-1-dev libxtst-dev libltdl-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -ForceOpenCVVersion 3 -DisableInteractive -DoNotUpdateTOOL + + + ubuntu-vcpkg-opencv2: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + sudo apt-get install -y --no-install-recommends yasm nasm gperf automake autoconf libtool pkg-config autoconf-archive libx11-dev libxft-dev libxext-dev libxrandr-dev libxi-dev libxcursor-dev libxdamage-dev libxinerama-dev libdbus-1-dev libxtst-dev libltdl-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -ForceOpenCVVersion 2 -DisableInteractive -DoNotUpdateTOOL + + + ubuntu: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y libopencv-dev + sudo apt-get install -y mono-devel zlib1g + + - name: Clean downloads + run: sudo apt-get clean + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + env: + CUDACXX: "/usr/local/cuda/bin/nvcc" + CUDA_PATH: "/usr/local/cuda" + CUDA_TOOLKIT_ROOT_DIR: "/usr/local/cuda" + LD_LIBRARY_PATH: "/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" + run: ${{ github.workspace }}/CCM/build.ps1 -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL + + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-uselib + path: ${{ github.workspace }}/uselib* + + + ubuntu-cuda: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y libopencv-dev + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + + - name: Clean downloads + run: sudo apt-get clean + + - uses: lukka/get-cmake@latest + + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.sh + + - name: 'Create softlinks for CUDA' + run: | + source ${{ github.workspace }}/scripts/requested_cuda_version.sh + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so + + - name: 'Build' + shell: pwsh + env: + CUDACXX: "/usr/local/cuda/bin/nvcc" + CUDA_PATH: "/usr/local/cuda" + CUDA_TOOLKIT_ROOT_DIR: "/usr/local/cuda" + LD_LIBRARY_PATH: "/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" + run: ${{ github.workspace }}/CCM/build.ps1 -EnableOPENCV -EnableCUDA -EnableCUDNN -DisableInteractive -DoNotUpdateTOOL + + - uses: actions/upload-artifact@v4 + with: + name: darknet-cuda-${{ runner.os }}-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-cuda-${{ runner.os }}-uselib + path: ${{ github.workspace }}/uselib* + + + ubuntu-no-ocv-cpp: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL -AdditionalBuildSetup "-DBUILD_AS_CPP:BOOL=ON" + + - name: Test on data/dog.jpg + shell: bash + run: > + wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights -O ${{ github.workspace }}/yolov4-tiny.weights; + ${{ github.workspace }}/build_release/darknet detect ${{ github.workspace }}/cfg/yolov4-tiny.cfg ${{ github.workspace }}/yolov4-tiny.weights ${{ github.workspace }}/data/dog.jpg -dont_show + + + ubuntu-setup-sh: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Setup' + shell: bash + run: ${{ github.workspace }}/scripts/setup.sh -InstallTOOLS -InstallCUDA -BypassDRIVER + + + osx-vcpkg: + runs-on: macos-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Setup tmate session + uses: mxschmitt/action-tmate@v3 + if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled }} + + - name: Install dependencies + run: brew install libomp yasm nasm pkg-config automake autoconf-archive mono + + - uses: lukka/get-cmake@latest + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL -BuildInstaller + + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-${{ runner.os }}-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-${{ runner.os }}-uselib + path: ${{ github.workspace }}/uselib* + + + osx: + runs-on: macos-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Install dependencies + run: brew install opencv libomp + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL + + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-uselib + path: ${{ github.workspace }}/uselib* + + + osx-no-ocv-no-omp-cpp: + runs-on: macos-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL -AdditionalBuildSetup "-DBUILD_AS_CPP:BOOL=ON" + + + win-vcpkg: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Setup tmate session + uses: mxschmitt/action-tmate@v3 + if: ${{ github.event_name == 'workflow_dispatch' && github.event.inputs.debug_enabled }} + + - uses: lukka/get-cmake@latest + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: Install NSIS + run: winget install --id NSIS.NSIS --accept-source-agreements --accept-package-agreements + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -ForceLocalVCPKG -DoNotUpdateVCPKG -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL -BuildInstaller + + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-${{ runner.os }}-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-${{ runner.os }}-dlls + path: ${{ github.workspace }}/build_release/*.dll + - uses: actions/upload-artifact@v4 + with: + name: darknet-vcpkg-${{ runner.os }}-uselib + path: ${{ github.workspace }}/uselib* + + + win-intlibs: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL + + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-intlibs-darknet + path: ${{ github.workspace }}/*dark* + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-intlibs-dlls + path: ${{ github.workspace }}/3rdparty/pthreads/bin/*.dll + - uses: actions/upload-artifact@v4 + with: + name: darknet-${{ runner.os }}-intlibs-uselib + path: ${{ github.workspace }}/uselib* + + + win-setup-ps1: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Setup' + shell: pwsh + run: ${{ github.workspace }}/scripts/setup.ps1 -InstallCUDA + + + win-vcpkg-base-cpp: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Setup NuGet API key if found + shell: bash + env: + BAGET_API_KEY: ${{ secrets.BAGET_API_KEY }} + if: env.BAGET_API_KEY != null + run: > + $(./vcpkg/vcpkg fetch nuget | tail -n 1) + setapikey ${{ secrets.BAGET_API_KEY }} + -Source http://93.49.111.10:5555/v3/index.json + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -ForceLocalVCPKG -DoNotUpdateVCPKG -DisableInteractive -DoNotUpdateTOOL -AdditionalBuildSetup "-DBUILD_AS_CPP:BOOL=ON" + + - name: Download yolov4-tiny.weights + run: curl.exe -L https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights -o ${{ github.workspace }}\yolov4-tiny.weights + - name: Test on data/dog.jpg + run: ${{ github.workspace }}\build_release\darknet.exe detect ${{ github.workspace }}\cfg\yolov4-tiny.cfg ${{ github.workspace }}\yolov4-tiny.weights ${{ github.workspace }}\data\dog.jpg + + + win-csharp: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL -DoNotUseNinja -AdditionalBuildSetup "-DENABLE_CSHARP_WRAPPER:BOOL=ON" + + + win-intlibs-cuda: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.ps1 + + - uses: lukka/get-cmake@latest + + - name: 'Build' + env: + CUDA_PATH: "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6" + CUDA_TOOLKIT_ROOT_DIR: "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6" + CUDACXX: "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\bin\\nvcc.exe" + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -EnableCUDA -DisableInteractive -DoNotUpdateTOOL + + + win-powershell51: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: powershell + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL + + + mingw: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build with CMake' + run: | + mkdir build_release + cd build_release + cmake .. -G"MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=OFF -DENABLE_CUDNN=OFF -DENABLE_OPENCV=OFF + cmake --build . --config Release --target install diff --git a/.github/workflows/on_pr.yml b/.github/workflows/on_pr.yml new file mode 100644 index 00000000000..1297c14fbcb --- /dev/null +++ b/.github/workflows/on_pr.yml @@ -0,0 +1,519 @@ +name: Darknet Pull Requests + +on: [pull_request] + +env: + VCPKG_BINARY_SOURCES: 'clear;nuget,vcpkgbinarycache,read' + VCPKG_FORCE_DOWNLOADED_BINARIES: "TRUE" + +jobs: + ubuntu-makefile: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y libopencv-dev + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.sh + + - name: 'Create softlinks for CUDA' + run: | + source ${{ github.workspace }}/scripts/requested_cuda_version.sh + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so + + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=0' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 -j 8 + make clean + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 DEBUG=1' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 DEBUG=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 AVX=1' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=0 AVX=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=0 CUDNN=0 OPENCV=1' + run: | + make LIBSO=1 GPU=0 CUDNN=0 OPENCV=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=1 CUDNN=1 OPENCV=1' + run: | + export PATH=/usr/local/cuda/bin:$PATH + export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH + make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1' + run: | + export PATH=/usr/local/cuda/bin:$PATH + export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH + make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 -j 8 + make clean + - name: 'LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 USE_CPP=1' + run: | + export PATH=/usr/local/cuda/bin:$PATH + export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH + make LIBSO=1 GPU=1 CUDNN=1 OPENCV=1 CUDNN_HALF=1 USE_CPP=1 -j 8 + make clean + + + ubuntu-vcpkg-opencv4-cuda: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + sudo apt-get install -y --no-install-recommends yasm nasm gperf automake autoconf libtool pkg-config autoconf-archive libx11-dev libxft-dev libxext-dev libxrandr-dev libxi-dev libxcursor-dev libxdamage-dev libxinerama-dev libdbus-1-dev libxtst-dev libltdl-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.sh + + - name: 'Create softlinks for CUDA' + run: | + source ${{ github.workspace }}/scripts/requested_cuda_version.sh + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Build' + shell: pwsh + env: + CUDACXX: "/usr/local/cuda/bin/nvcc" + CUDA_PATH: "/usr/local/cuda" + CUDA_TOOLKIT_ROOT_DIR: "/usr/local/cuda" + LD_LIBRARY_PATH: "/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -EnableCUDA -EnableCUDNN -DisableInteractive -DoNotUpdateTOOL -BuildInstaller + + + ubuntu-vcpkg-opencv3: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + sudo apt-get install -y --no-install-recommends yasm nasm gperf automake autoconf libtool pkg-config autoconf-archive libx11-dev libxft-dev libxext-dev libxrandr-dev libxi-dev libxcursor-dev libxdamage-dev libxinerama-dev libdbus-1-dev libxtst-dev libltdl-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -ForceOpenCVVersion 3 -DisableInteractive -DoNotUpdateTOOL + + + ubuntu-vcpkg-opencv2: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + sudo apt-get install -y --no-install-recommends yasm nasm gperf automake autoconf libtool pkg-config autoconf-archive libx11-dev libxft-dev libxext-dev libxrandr-dev libxi-dev libxcursor-dev libxdamage-dev libxinerama-dev libdbus-1-dev libxtst-dev libltdl-dev + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -ForceOpenCVVersion 2 -DisableInteractive -DoNotUpdateTOOL + + + ubuntu: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y libopencv-dev + sudo apt-get install -y mono-devel zlib1g + + - name: Clean downloads + run: sudo apt-get clean + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + env: + CUDACXX: "/usr/local/cuda/bin/nvcc" + CUDA_PATH: "/usr/local/cuda" + CUDA_TOOLKIT_ROOT_DIR: "/usr/local/cuda" + LD_LIBRARY_PATH: "/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" + run: ${{ github.workspace }}/CCM/build.ps1 -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL + + + ubuntu-cuda: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y libopencv-dev + sudo apt-get install -y mono-devel zlib1g + sudo apt-get install -y libgles2-mesa-dev + + - name: Clean downloads + run: sudo apt-get clean + + - uses: lukka/get-cmake@latest + + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.sh + + - name: 'Create softlinks for CUDA' + run: | + source ${{ github.workspace }}/scripts/requested_cuda_version.sh + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so.1 + sudo ln -s /usr/local/cuda-${CUDA_VERSION}/lib64/stubs/libcuda.so /usr/local/cuda-${CUDA_VERSION}/lib64/libcuda.so + + - name: 'Build' + shell: pwsh + env: + CUDACXX: "/usr/local/cuda/bin/nvcc" + CUDA_PATH: "/usr/local/cuda" + CUDA_TOOLKIT_ROOT_DIR: "/usr/local/cuda" + LD_LIBRARY_PATH: "/usr/local/cuda/lib64:/usr/local/cuda/lib64/stubs:$LD_LIBRARY_PATH" + run: ${{ github.workspace }}/CCM/build.ps1 -EnableOPENCV -EnableCUDA -EnableCUDNN -DisableInteractive -DoNotUpdateTOOL + + + ubuntu-no-ocv-cpp: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL -AdditionalBuildSetup "-DBUILD_AS_CPP:BOOL=ON" + + - name: Test on data/dog.jpg + shell: bash + run: > + wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights -O ${{ github.workspace }}/yolov4-tiny.weights; + ${{ github.workspace }}/build_release/darknet detect ${{ github.workspace }}/cfg/yolov4-tiny.cfg ${{ github.workspace }}/yolov4-tiny.weights ${{ github.workspace }}/data/dog.jpg -dont_show + + + ubuntu-setup-sh: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Update apt & install dependencies + run: | + sudo apt-get update + sudo apt-get install -y mono-devel + + - name: Clean downloads + run: sudo apt-get clean + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Setup' + shell: bash + run: ${{ github.workspace }}/scripts/setup.sh -InstallTOOLS -InstallCUDA -BypassDRIVER + + + osx-vcpkg: + runs-on: macos-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Install dependencies + run: brew install libomp yasm nasm pkg-config automake autoconf-archive mono + + - uses: lukka/get-cmake@latest + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + mono $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -DoNotUpdateVCPKG -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL -BuildInstaller + + + osx: + runs-on: macos-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: Install dependencies + run: brew install opencv libomp + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL + + + osx-no-ocv-no-omp-cpp: + runs-on: macos-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL -AdditionalBuildSetup "-DBUILD_AS_CPP:BOOL=ON" + + + win-vcpkg: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: Install NSIS + run: winget install --id NSIS.NSIS --accept-source-agreements --accept-package-agreements + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -ForceLocalVCPKG -DoNotUpdateVCPKG -EnableOPENCV -DisableInteractive -DoNotUpdateTOOL -BuildInstaller + + + win-intlibs: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL + + + win-setup-ps1: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Setup' + shell: pwsh + run: ${{ github.workspace }}/scripts/setup.ps1 -InstallCUDA + + + win-vcpkg-base-cpp: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Setup vcpkg and NuGet artifacts backend' + shell: bash + run: > + git clone --depth 1 https://github.com/microsoft/vcpkg ; + ./vcpkg/bootstrap-vcpkg.sh ; + $(./vcpkg/vcpkg fetch nuget | tail -n 1) sources add + -Name "vcpkgbinarycache" + -Source http://93.49.111.10:5555/v3/index.json + -AllowInsecureConnections + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -UseVCPKG -ForceLocalVCPKG -DoNotUpdateVCPKG -DisableInteractive -DoNotUpdateTOOL -AdditionalBuildSetup "-DBUILD_AS_CPP:BOOL=ON" + + - name: Download yolov4-tiny.weights + run: curl.exe -L https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights -o ${{ github.workspace }}\yolov4-tiny.weights + - name: Test on data/dog.jpg + run: ${{ github.workspace }}\build_release\darknet.exe detect ${{ github.workspace }}\cfg\yolov4-tiny.cfg ${{ github.workspace }}\yolov4-tiny.weights ${{ github.workspace }}\data\dog.jpg + + + win-csharp: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL -DoNotUseNinja -AdditionalBuildSetup "-DENABLE_CSHARP_WRAPPER:BOOL=ON" + + + win-intlibs-cuda: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + - name: 'Install CUDA' + run: ${{ github.workspace }}/scripts/deploy-cuda.ps1 + + - uses: lukka/get-cmake@latest + + - name: 'Build' + env: + CUDA_PATH: "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6" + CUDA_TOOLKIT_ROOT_DIR: "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6" + CUDACXX: "C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v12.6\\bin\\nvcc.exe" + shell: pwsh + run: ${{ github.workspace }}/CCM/build.ps1 -EnableCUDA -DisableInteractive -DoNotUpdateTOOL + + + win-powershell51: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build' + shell: powershell + run: ${{ github.workspace }}/CCM/build.ps1 -DisableInteractive -DoNotUpdateTOOL + + + mingw: + runs-on: windows-latest + steps: + - uses: actions/checkout@v4 + with: + submodules: 'true' + + - uses: lukka/get-cmake@latest + + - name: 'Build with CMake' + run: | + mkdir build_release + cd build_release + cmake .. -G"MinGW Makefiles" -DCMAKE_BUILD_TYPE=Release -DENABLE_CUDA=OFF -DENABLE_CUDNN=OFF -DENABLE_OPENCV=OFF + cmake --build . --config Release --target install diff --git a/.gitignore b/.gitignore index bbfed659be5..3166d82c7b3 100644 --- a/.gitignore +++ b/.gitignore @@ -1,9 +1,17 @@ *.o +*.a *.dSYM *.csv *.out *.png +*.jpg *.so +*.exe +*.dll +*.lib +*.dylib +*.pyc +*.bak mnist/ data/ caffe/ @@ -14,8 +22,30 @@ convnet/ decaf/ submission/ cfg/ -darknet +temp/ +build/darknet/* +build_*/ +debug/ +ninja/ +ninja.zip +vcpkg_installed/ +!build/darknet/YoloWrapper.cs .fuse* +*.weights +build/*.cmake +build/*.ninja +build/*.txt +build/*.json +build/CMakeFiles/ +build/detect_cuda_compute_capabilities.cu +build/.ninja_deps +build/.ninja_log +build/Makefile +CMakeFiles/ +CMakeCache.txt +*/vcpkg-manifest-install.log +build.log +__pycache__/ # OS Generated # .DS_Store* @@ -24,6 +54,19 @@ Icon? Thumbs.db *.swp -# CMake # -cmake-build-debug/ -CMakeLists.txt +# IDE generated # +.vs/ +.vscode/ + +# Managed by CMake +src/version.h + +# Build artifacts +lib/ +share/ +include/darknet/ +uselib +uselib_track +darknet +kmeansiou +vcpkg/ diff --git a/.gitmodules b/.gitmodules new file mode 100644 index 00000000000..7530075951e --- /dev/null +++ b/.gitmodules @@ -0,0 +1,3 @@ +[submodule "CCM"] + path = CCM + url = https://github.com/cenit/CCM diff --git a/3rdparty/dll/x86/pthreadGC2.dll b/3rdparty/dll/x86/pthreadGC2.dll deleted file mode 100644 index 67b9289dfd9..00000000000 Binary files a/3rdparty/dll/x86/pthreadGC2.dll and /dev/null differ diff --git a/3rdparty/dll/x86/pthreadGCE2.dll b/3rdparty/dll/x86/pthreadGCE2.dll deleted file mode 100644 index 9e18ea24b8b..00000000000 Binary files a/3rdparty/dll/x86/pthreadGCE2.dll and /dev/null differ diff --git a/3rdparty/dll/x86/pthreadVC2.dll b/3rdparty/dll/x86/pthreadVC2.dll deleted file mode 100644 index fcb5d9dcc1c..00000000000 Binary files a/3rdparty/dll/x86/pthreadVC2.dll and /dev/null differ diff --git a/3rdparty/dll/x86/pthreadVCE2.dll b/3rdparty/dll/x86/pthreadVCE2.dll deleted file mode 100644 index 9d148cc0dae..00000000000 Binary files a/3rdparty/dll/x86/pthreadVCE2.dll and /dev/null differ diff --git a/3rdparty/dll/x86/pthreadVSE2.dll b/3rdparty/dll/x86/pthreadVSE2.dll deleted file mode 100644 index 8129116fd41..00000000000 Binary files a/3rdparty/dll/x86/pthreadVSE2.dll and /dev/null differ diff --git a/3rdparty/getopt/getopt.c b/3rdparty/getopt/getopt.c new file mode 100644 index 00000000000..61aa096ab5d --- /dev/null +++ b/3rdparty/getopt/getopt.c @@ -0,0 +1,498 @@ +#ifdef _MSC_VER +#include "getopt.h" + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef REPLACE_GETOPT +int opterr = 1; /* if error message should be printed */ +int optind = 1; /* index into parent argv vector */ +int optopt = '?'; /* character checked for validity */ +#undef optreset /* see getopt.h */ +#define optreset __mingw_optreset +int optreset; /* reset getopt */ +char* optarg; /* argument associated with option */ +#endif + +static void +_vwarnx(const char* fmt, va_list ap) +{ + (void)fprintf(stderr, "%s: ", __progname); + if (fmt != NULL) + (void)vfprintf(stderr, fmt, ap); + (void)fprintf(stderr, "\n"); +} + +static void +warnx(const char* fmt, ...) +{ + va_list ap; + va_start(ap, fmt); + _vwarnx(fmt, ap); + va_end(ap); +} + +/* + * Compute the greatest common divisor of a and b. + */ +static int +gcd(int a, int b) +{ + int c; + + c = a % b; + while (c != 0) { + a = b; + b = c; + c = a % b; + } + + return (b); +} + +/* + * Exchange the block from nonopt_start to nonopt_end with the block + * from nonopt_end to opt_end (keeping the same order of arguments + * in each block). + */ +static void +permute_args(int panonopt_start, int panonopt_end, int opt_end, + char* const* nargv) +{ + int cstart, cyclelen, i, j, ncycle, nnonopts, nopts, pos; + char* swap; + + /* + * compute lengths of blocks and number and size of cycles + */ + nnonopts = panonopt_end - panonopt_start; + nopts = opt_end - panonopt_end; + ncycle = gcd(nnonopts, nopts); + cyclelen = (opt_end - panonopt_start) / ncycle; + + for (i = 0; i < ncycle; i++) { + cstart = panonopt_end + i; + pos = cstart; + for (j = 0; j < cyclelen; j++) { + if (pos >= panonopt_end) + pos -= nnonopts; + else + pos += nopts; + swap = nargv[pos]; + /* LINTED const cast */ + ((char**)nargv)[pos] = nargv[cstart]; + /* LINTED const cast */ + ((char**)nargv)[cstart] = swap; + } + } +} + +#ifdef REPLACE_GETOPT +/* + * getopt -- + * Parse argc/argv argument vector. + * + * [eventually this will replace the BSD getopt] + */ +int getopt(int nargc, char* const* nargv, const char* options) +{ + + /* + * We don't pass FLAG_PERMUTE to getopt_internal() since + * the BSD getopt(3) (unlike GNU) has never done this. + * + * Furthermore, since many privileged programs call getopt() + * before dropping privileges it makes sense to keep things + * as simple (and bug-free) as possible. + */ + return (getopt_internal(nargc, nargv, options, NULL, NULL, 0)); +} +#endif /* REPLACE_GETOPT */ + +//extern int getopt(int nargc, char * const *nargv, const char *options); + +#ifdef __cplusplus +} +#endif +/* + * POSIX requires the `getopt' API to be specified in `unistd.h'; + * thus, `unistd.h' includes this header. However, we do not want + * to expose the `getopt_long' or `getopt_long_only' APIs, when + * included in this manner. Thus, close the standard __GETOPT_H__ + * declarations block, and open an additional __GETOPT_LONG_H__ + * specific block, only when *not* __UNISTD_H_SOURCED__, in which + * to declare the extended API. + */ + +#ifdef __cplusplus +extern "C" { +#endif + +struct option /* specification for a long form option... */ +{ + const char* name; /* option name, without leading hyphens */ + int has_arg; /* does it take an argument? */ + int* flag; /* where to save its status, or NULL */ + int val; /* its associated status value */ +}; + +enum /* permitted values for its `has_arg' field... */ +{ + no_argument = 0, /* option never takes an argument */ + required_argument, /* option always requires an argument */ + optional_argument /* option may take an argument */ +}; + +/* + * parse_long_options -- + * Parse long options in argc/argv argument vector. + * Returns -1 if short_too is set and the option does not match long_options. + */ +static int +parse_long_options(char* const* nargv, const char* options, + const struct option* long_options, int* idx, int short_too) +{ + char *current_argv, *has_equal; + size_t current_argv_len; + int i, ambiguous, match; + +#define IDENTICAL_INTERPRETATION(_x, _y) \ + (long_options[(_x)].has_arg == long_options[(_y)].has_arg && long_options[(_x)].flag == long_options[(_y)].flag && long_options[(_x)].val == long_options[(_y)].val) + + current_argv = place; + match = -1; + ambiguous = 0; + + optind++; + + if ((has_equal = strchr(current_argv, '=')) != NULL) { + /* argument found (--option=arg) */ + current_argv_len = has_equal - current_argv; + has_equal++; + } else + current_argv_len = strlen(current_argv); + + for (i = 0; long_options[i].name; i++) { + /* find matching long option */ + if (strncmp(current_argv, long_options[i].name, + current_argv_len)) + continue; + + if (strlen(long_options[i].name) == current_argv_len) { + /* exact match */ + match = i; + ambiguous = 0; + break; + } + /* + * If this is a known short option, don't allow + * a partial match of a single character. + */ + if (short_too && current_argv_len == 1) + continue; + + if (match == -1) /* partial match */ + match = i; + else if (!IDENTICAL_INTERPRETATION(i, match)) + ambiguous = 1; + } + if (ambiguous) { + /* ambiguous abbreviation */ + if (PRINT_ERROR) + warnx(ambig, (int)current_argv_len, + current_argv); + optopt = 0; + return (BADCH); + } + if (match != -1) { /* option found */ + if (long_options[match].has_arg == no_argument + && has_equal) { + if (PRINT_ERROR) + warnx(noarg, (int)current_argv_len, + current_argv); + /* + * XXX: GNU sets optopt to val regardless of flag + */ + if (long_options[match].flag == NULL) + optopt = long_options[match].val; + else + optopt = 0; + return (BADARG); + } + if (long_options[match].has_arg == required_argument || long_options[match].has_arg == optional_argument) { + if (has_equal) + optarg = has_equal; + else if (long_options[match].has_arg == required_argument) { + /* + * optional argument doesn't use next nargv + */ + optarg = nargv[optind++]; + } + } + if ((long_options[match].has_arg == required_argument) + && (optarg == NULL)) { + /* + * Missing argument; leading ':' indicates no error + * should be generated. + */ + if (PRINT_ERROR) + warnx(recargstring, + current_argv); + /* + * XXX: GNU sets optopt to val regardless of flag + */ + if (long_options[match].flag == NULL) + optopt = long_options[match].val; + else + optopt = 0; + --optind; + return (BADARG); + } + } else { /* unknown option */ + if (short_too) { + --optind; + return (-1); + } + if (PRINT_ERROR) + warnx(illoptstring, current_argv); + optopt = 0; + return (BADCH); + } + if (idx) + *idx = match; + if (long_options[match].flag) { + *long_options[match].flag = long_options[match].val; + return (0); + } else + return (long_options[match].val); +#undef IDENTICAL_INTERPRETATION +} + +/* + * getopt_internal -- + * Parse argc/argv argument vector. Called by user level routines. + */ +static int +getopt_internal(int nargc, char* const* nargv, const char* options, + const struct option* long_options, int* idx, int flags) +{ + char* oli; /* option letter list index */ + int optchar, short_too; + static int posixly_correct = -1; + + if (options == NULL) + return (-1); + + /* + * XXX Some GNU programs (like cvs) set optind to 0 instead of + * XXX using optreset. Work around this braindamage. + */ + if (optind == 0) + optind = optreset = 1; + + /* + * Disable GNU extensions if POSIXLY_CORRECT is set or options + * string begins with a '+'. + * + * CV, 2009-12-14: Check POSIXLY_CORRECT anew if optind == 0 or + * optreset != 0 for GNU compatibility. + */ + if (posixly_correct == -1 || optreset != 0) + posixly_correct = (getenv("POSIXLY_CORRECT") != NULL); + if (*options == '-') + flags |= FLAG_ALLARGS; + else if (posixly_correct || *options == '+') + flags &= ~FLAG_PERMUTE; + if (*options == '+' || *options == '-') + options++; + + optarg = NULL; + if (optreset) + nonopt_start = nonopt_end = -1; +start: + if (optreset || !*place) { /* update scanning pointer */ + optreset = 0; + if (optind >= nargc) { /* end of argument vector */ + place = EMSG; + if (nonopt_end != -1) { + /* do permutation, if we have to */ + permute_args(nonopt_start, nonopt_end, + optind, nargv); + optind -= nonopt_end - nonopt_start; + } else if (nonopt_start != -1) { + /* + * If we skipped non-options, set optind + * to the first of them. + */ + optind = nonopt_start; + } + nonopt_start = nonopt_end = -1; + return (-1); + } + if (*(place = nargv[optind]) != '-' || (place[1] == '\0' && strchr(options, '-') == NULL)) { + place = EMSG; /* found non-option */ + if (flags & FLAG_ALLARGS) { + /* + * GNU extension: + * return non-option as argument to option 1 + */ + optarg = nargv[optind++]; + return (INORDER); + } + if (!(flags & FLAG_PERMUTE)) { + /* + * If no permutation wanted, stop parsing + * at first non-option. + */ + return (-1); + } + /* do permutation */ + if (nonopt_start == -1) + nonopt_start = optind; + else if (nonopt_end != -1) { + permute_args(nonopt_start, nonopt_end, + optind, nargv); + nonopt_start = optind - (nonopt_end - nonopt_start); + nonopt_end = -1; + } + optind++; + /* process next argument */ + goto start; + } + if (nonopt_start != -1 && nonopt_end == -1) + nonopt_end = optind; + + /* + * If we have "-" do nothing, if "--" we are done. + */ + if (place[1] != '\0' && *++place == '-' && place[1] == '\0') { + optind++; + place = EMSG; + /* + * We found an option (--), so if we skipped + * non-options, we have to permute. + */ + if (nonopt_end != -1) { + permute_args(nonopt_start, nonopt_end, + optind, nargv); + optind -= nonopt_end - nonopt_start; + } + nonopt_start = nonopt_end = -1; + return (-1); + } + } + + /* + * Check long options if: + * 1) we were passed some + * 2) the arg is not just "-" + * 3) either the arg starts with -- we are getopt_long_only() + */ + if (long_options != NULL && place != nargv[optind] && (*place == '-' || (flags & FLAG_LONGONLY))) { + short_too = 0; + if (*place == '-') + place++; /* --foo long option */ + else if (*place != ':' && strchr(options, *place) != NULL) + short_too = 1; /* could be short option too */ + + optchar = parse_long_options(nargv, options, long_options, + idx, short_too); + if (optchar != -1) { + place = EMSG; + return (optchar); + } + } + + if ((optchar = (int)*place++) == (int)':' || (optchar == (int)'-' && *place != '\0') || (oli = (char*)strchr(options, optchar)) == NULL) { + /* + * If the user specified "-" and '-' isn't listed in + * options, return -1 (non-option) as per POSIX. + * Otherwise, it is an unknown option character (or ':'). + */ + if (optchar == (int)'-' && *place == '\0') + return (-1); + if (!*place) + ++optind; + if (PRINT_ERROR) + warnx(illoptchar, optchar); + optopt = optchar; + return (BADCH); + } + if (long_options != NULL && optchar == 'W' && oli[1] == ';') { + /* -W long-option */ + if (*place) /* no space */ + /* NOTHING */; + else if (++optind >= nargc) { /* no arg */ + place = EMSG; + if (PRINT_ERROR) + warnx(recargchar, optchar); + optopt = optchar; + return (BADARG); + } else /* white space */ + place = nargv[optind]; + optchar = parse_long_options(nargv, options, long_options, + idx, 0); + place = EMSG; + return (optchar); + } + if (*++oli != ':') { /* doesn't take argument */ + if (!*place) + ++optind; + } else { /* takes (optional) argument */ + optarg = NULL; + if (*place) /* no white space */ + optarg = place; + else if (oli[1] != ':') { /* arg not optional */ + if (++optind >= nargc) { /* no arg */ + place = EMSG; + if (PRINT_ERROR) + warnx(recargchar, optchar); + optopt = optchar; + return (BADARG); + } else + optarg = nargv[optind]; + } + place = EMSG; + ++optind; + } + /* dump back option letter */ + return (optchar); +} + +/* + * getopt_long -- + * Parse argc/argv argument vector. + */ +int getopt_long(int nargc, char* const* nargv, const char* options, + const struct option* long_options, int* idx) +{ + + return (getopt_internal(nargc, nargv, options, long_options, idx, + FLAG_PERMUTE)); +} + +/* + * getopt_long_only -- + * Parse argc/argv argument vector. + */ +int getopt_long_only(int nargc, char* const* nargv, const char* options, + const struct option* long_options, int* idx) +{ + + return (getopt_internal(nargc, nargv, options, long_options, idx, + FLAG_PERMUTE | FLAG_LONGONLY)); +} + +//extern int getopt_long(int nargc, char * const *nargv, const char *options, +// const struct option *long_options, int *idx); +//extern int getopt_long_only(int nargc, char * const *nargv, const char *options, +// const struct option *long_options, int *idx); +/* + * Previous MinGW implementation had... + */ + +#ifdef __cplusplus +} +#endif +#endif diff --git a/3rdparty/getopt/getopt.h b/3rdparty/getopt/getopt.h new file mode 100644 index 00000000000..a9f913d4f3d --- /dev/null +++ b/3rdparty/getopt/getopt.h @@ -0,0 +1,228 @@ +#ifdef _MSC_VER +#ifndef __GETOPT_H__ +/** + * DISCLAIMER + * This file is part of the mingw-w64 runtime package. + * + * The mingw-w64 runtime package and its code is distributed in the hope that it + * will be useful but WITHOUT ANY WARRANTY. ALL WARRANTIES, EXPRESSED OR + * IMPLIED ARE HEREBY DISCLAIMED. This includes but is not limited to + * warranties of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. + */ +/* + * Copyright (c) 2002 Todd C. Miller + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Sponsored in part by the Defense Advanced Research Projects + * Agency (DARPA) and Air Force Research Laboratory, Air Force + * Materiel Command, USAF, under agreement number F39502-99-1-0512. + */ +/*- + * Copyright (c) 2000 The NetBSD Foundation, Inc. + * All rights reserved. + * + * This code is derived from software contributed to The NetBSD Foundation + * by Dieter Baron and Thomas Klausner. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * 1. Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * 2. Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * + * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS + * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED + * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR + * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS + * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR + * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF + * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS + * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN + * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) + * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE + * POSSIBILITY OF SUCH DAMAGE. + */ + +#define __GETOPT_H__ + +/* All the headers include this file. */ +#include +#include +#include +#include +#include +#include +#define WIN32_LEAN_AND_MEAN +#include + +#ifdef __cplusplus +extern "C" { +#endif + +#define REPLACE_GETOPT /* use this getopt as the system getopt(3) */ + +//extern int optind; /* index of first non-option in argv */ +//extern int optopt; /* single option character, as parsed */ +//extern int opterr; /* flag to enable built-in diagnostics... */ +// /* (user may set to zero, to suppress) */ +// +//extern char *optarg; /* pointer to argument of current option */ + +#define PRINT_ERROR ((opterr) && (*options != ':')) + +#define FLAG_PERMUTE 0x01 /* permute non-options to the end of argv */ +#define FLAG_ALLARGS 0x02 /* treat non-options as args to option "-1" */ +#define FLAG_LONGONLY 0x04 /* operate as getopt_long_only */ + +/* return values */ +#define BADCH (int)'?' +#define BADARG ((*options == ':') ? (int)':' : (int)'?') +#define INORDER (int)1 + +#ifndef __CYGWIN__ +#define __progname __argv[0] +#else +extern char __declspec(dllimport) * __progname; +#endif + +#ifdef __CYGWIN__ +static char EMSG[] = ""; +#else +#define EMSG "" +#endif + +static int getopt_internal(int, char* const*, const char*, + const struct option*, int*, int); +static int parse_long_options(char* const*, const char*, + const struct option*, int*, int); +static int gcd(int, int); +static void permute_args(int, int, int, char* const*); + +static char* place = EMSG; /* option letter processing */ + +/* XXX: set optreset to 1 rather than these two */ +static int nonopt_start = -1; /* first non option argument (for permute) */ +static int nonopt_end = -1; /* first option after non options (for permute) */ + +/* Error messages */ +static const char recargchar[] = "option requires an argument -- %c"; +static const char recargstring[] = "option requires an argument -- %s"; +static const char ambig[] = "ambiguous option -- %.*s"; +static const char noarg[] = "option doesn't take an argument -- %.*s"; +static const char illoptchar[] = "unknown option -- %c"; +static const char illoptstring[] = "unknown option -- %s"; + +static void _vwarnx(const char* fmt, va_list ap); + +static void warnx(const char* fmt, ...); + +/* + * Compute the greatest common divisor of a and b. + */ +static int gcd(int a, int b); + +/* + * Exchange the block from nonopt_start to nonopt_end with the block + * from nonopt_end to opt_end (keeping the same order of arguments + * in each block). + */ +static void permute_args(int panonopt_start, int panonopt_end, int opt_end, char* const* nargv); + +#ifdef REPLACE_GETOPT +/* + * getopt -- + * Parse argc/argv argument vector. + * + * [eventually this will replace the BSD getopt] + */ +int getopt(int nargc, char* const* nargv, const char* options); +#endif /* REPLACE_GETOPT */ + +//extern int getopt(int nargc, char * const *nargv, const char *options); + +#ifdef _BSD_SOURCE +/* + * BSD adds the non-standard `optreset' feature, for reinitialisation + * of `getopt' parsing. We support this feature, for applications which + * proclaim their BSD heritage, before including this header; however, + * to maintain portability, developers are advised to avoid it. + */ +#define optreset __mingw_optreset +extern int optreset; +#endif +#ifdef __cplusplus +} +#endif +/* + * POSIX requires the `getopt' API to be specified in `unistd.h'; + * thus, `unistd.h' includes this header. However, we do not want + * to expose the `getopt_long' or `getopt_long_only' APIs, when + * included in this manner. Thus, close the standard __GETOPT_H__ + * declarations block, and open an additional __GETOPT_LONG_H__ + * specific block, only when *not* __UNISTD_H_SOURCED__, in which + * to declare the extended API. + */ +#endif /* !defined(__GETOPT_H__) */ + +#if !defined(__UNISTD_H_SOURCED__) && !defined(__GETOPT_LONG_H__) +#define __GETOPT_LONG_H__ + +#ifdef __cplusplus +extern "C" { +#endif + +/* + * parse_long_options -- + * Parse long options in argc/argv argument vector. + * Returns -1 if short_too is set and the option does not match long_options. + */ +/* static int parse_long_options(char* const* nargv, const char* options, const struct option* long_options, int* idx, int short_too); */ + +/* + * getopt_internal -- + * Parse argc/argv argument vector. Called by user level routines. + */ +/* static int getopt_internal(int nargc, char* const* nargv, const char* options, const struct option* long_options, int* idx, int flags); */ + +/* + * getopt_long -- + * Parse argc/argv argument vector. + */ +int getopt_long(int nargc, char* const* nargv, const char* options, const struct option* long_options, int* idx); + +/* + * getopt_long_only -- + * Parse argc/argv argument vector. + */ +int getopt_long_only(int nargc, char* const* nargv, const char* options, const struct option* long_options, int* idx); + +/* + * Previous MinGW implementation had... + */ +#ifndef HAVE_DECL_GETOPT +/* + * ...for the long form API only; keep this for compatibility. + */ +#define HAVE_DECL_GETOPT 1 +#endif + +#ifdef __cplusplus +} +#endif + +#endif /* !defined(__UNISTD_H_SOURCED__) && !defined(__GETOPT_LONG_H__) */ +#endif diff --git a/3rdparty/lib/x86/libpthreadGC2.a b/3rdparty/lib/x86/libpthreadGC2.a deleted file mode 100644 index df211759f80..00000000000 Binary files a/3rdparty/lib/x86/libpthreadGC2.a and /dev/null differ diff --git a/3rdparty/lib/x86/libpthreadGCE2.a b/3rdparty/lib/x86/libpthreadGCE2.a deleted file mode 100644 index 9c56202c582..00000000000 Binary files a/3rdparty/lib/x86/libpthreadGCE2.a and /dev/null differ diff --git a/3rdparty/lib/x86/pthreadVC2.lib b/3rdparty/lib/x86/pthreadVC2.lib deleted file mode 100644 index c20ee200db3..00000000000 Binary files a/3rdparty/lib/x86/pthreadVC2.lib and /dev/null differ diff --git a/3rdparty/lib/x86/pthreadVCE2.lib b/3rdparty/lib/x86/pthreadVCE2.lib deleted file mode 100644 index 7f05317ba28..00000000000 Binary files a/3rdparty/lib/x86/pthreadVCE2.lib and /dev/null differ diff --git a/3rdparty/lib/x86/pthreadVSE2.lib b/3rdparty/lib/x86/pthreadVSE2.lib deleted file mode 100644 index 3f3335d4698..00000000000 Binary files a/3rdparty/lib/x86/pthreadVSE2.lib and /dev/null differ diff --git a/3rdparty/dll/x64/pthreadGC2.dll b/3rdparty/pthreads/bin/pthreadGC2.dll similarity index 100% rename from 3rdparty/dll/x64/pthreadGC2.dll rename to 3rdparty/pthreads/bin/pthreadGC2.dll diff --git a/3rdparty/dll/x64/pthreadVC2.dll b/3rdparty/pthreads/bin/pthreadVC2.dll similarity index 100% rename from 3rdparty/dll/x64/pthreadVC2.dll rename to 3rdparty/pthreads/bin/pthreadVC2.dll diff --git a/3rdparty/include/pthread.h b/3rdparty/pthreads/include/pthread.h similarity index 100% rename from 3rdparty/include/pthread.h rename to 3rdparty/pthreads/include/pthread.h diff --git a/3rdparty/include/sched.h b/3rdparty/pthreads/include/sched.h similarity index 100% rename from 3rdparty/include/sched.h rename to 3rdparty/pthreads/include/sched.h diff --git a/3rdparty/include/semaphore.h b/3rdparty/pthreads/include/semaphore.h similarity index 100% rename from 3rdparty/include/semaphore.h rename to 3rdparty/pthreads/include/semaphore.h diff --git a/3rdparty/lib/x64/libpthreadGC2.a b/3rdparty/pthreads/lib/libpthreadGC2.a similarity index 100% rename from 3rdparty/lib/x64/libpthreadGC2.a rename to 3rdparty/pthreads/lib/libpthreadGC2.a diff --git a/3rdparty/lib/x64/pthreadVC2.lib b/3rdparty/pthreads/lib/pthreadVC2.lib similarity index 100% rename from 3rdparty/lib/x64/pthreadVC2.lib rename to 3rdparty/pthreads/lib/pthreadVC2.lib diff --git a/src/stb_image.h b/3rdparty/stb/include/stb_image.h similarity index 80% rename from src/stb_image.h rename to 3rdparty/stb/include/stb_image.h index a056138dd04..9eedabedc45 100644 --- a/src/stb_image.h +++ b/3rdparty/stb/include/stb_image.h @@ -1,5 +1,5 @@ -/* stb_image - v2.16 - public domain image loader - http://nothings.org/stb_image.h - no warranty implied; use at your own risk +/* stb_image - v2.30 - public domain image loader - http://nothings.org/stb + no warranty implied; use at your own risk Do this: #define STB_IMAGE_IMPLEMENTATION @@ -48,6 +48,20 @@ LICENSE RECENT REVISION HISTORY: + 2.30 (2024-05-31) avoid erroneous gcc warning + 2.29 (2023-05-xx) optimizations + 2.28 (2023-01-29) many error fixes, security errors, just tons of stuff + 2.27 (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes + 2.26 (2020-07-13) many minor fixes + 2.25 (2020-02-02) fix warnings + 2.24 (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically + 2.23 (2019-08-11) fix clang static analysis warning + 2.22 (2019-03-04) gif fixes, fix warnings + 2.21 (2019-02-25) fix typo in comment + 2.20 (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs + 2.19 (2018-02-11) fix warning + 2.18 (2018-01-30) fix warnings + 2.17 (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings 2.16 (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes 2.15 (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC 2.14 (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs @@ -74,29 +88,42 @@ RECENT REVISION HISTORY: Thatcher Ulrich (psd) Nicolas Guillemot (vertical flip) Ken Miller (pgm, ppm) Richard Mitton (16-bit PSD) github:urraka (animated gif) Junggon Kim (PNM comments) - Daniel Gibson (16-bit TGA) + Christopher Forseth (animated gif) Daniel Gibson (16-bit TGA) socks-the-fox (16-bit PNG) Jeremy Sawicki (handle all ImageNet JPGs) - Optimizations & bugfixes - Fabian "ryg" Giesen - Arseny Kapoulkine + Optimizations & bugfixes Mikhail Morozov (1-bit BMP) + Fabian "ryg" Giesen Anael Seghezzi (is-16-bit query) + Arseny Kapoulkine Simon Breuss (16-bit PNM) John-Mark Allen + Carmelo J Fdez-Aguera Bug & warning fixes - Marc LeBlanc David Woo Guillaume George Martins Mozeiko - Christpher Lloyd Jerry Jansson Joseph Thomson Phil Jordan - Dave Moore Roy Eltham Hayaki Saito Nathan Reed - Won Chun Luke Graham Johan Duparc Nick Verigakis - the Horde3D community Thomas Ruf Ronny Chevalier Baldur Karlsson - Janez Zemva John Bartholomew Michal Cichon github:rlyeh - Jonathan Blow Ken Hamada Tero Hanninen github:romigrou - Laurent Gomila Cort Stratton Sergio Gonzalez github:svdijk - Aruelien Pocheville Thibault Reuille Cass Everitt github:snagar - Ryamond Barbiero Paul Du Bois Engin Manap github:Zelex - Michaelangel007@github Philipp Wiesemann Dale Weiler github:grim210 - Oriol Ferrer Mesia Josh Tobin Matthew Gregan github:sammyhw - Blazej Dariusz Roszkowski Gregory Mullen github:phprus - Christian Floisand Kevin Schmidt github:poppolopoppo + Marc LeBlanc David Woo Guillaume George Martins Mozeiko + Christpher Lloyd Jerry Jansson Joseph Thomson Blazej Dariusz Roszkowski + Phil Jordan Dave Moore Roy Eltham + Hayaki Saito Nathan Reed Won Chun + Luke Graham Johan Duparc Nick Verigakis the Horde3D community + Thomas Ruf Ronny Chevalier github:rlyeh + Janez Zemva John Bartholomew Michal Cichon github:romigrou + Jonathan Blow Ken Hamada Tero Hanninen github:svdijk + Eugene Golushkov Laurent Gomila Cort Stratton github:snagar + Aruelien Pocheville Sergio Gonzalez Thibault Reuille github:Zelex + Cass Everitt Ryamond Barbiero github:grim210 + Paul Du Bois Engin Manap Aldo Culquicondor github:sammyhw + Philipp Wiesemann Dale Weiler Oriol Ferrer Mesia github:phprus + Josh Tobin Neil Bickford Matthew Gregan github:poppolopoppo + Julian Raschke Gregory Mullen Christian Floisand github:darealshinji + Baldur Karlsson Kevin Schmidt JR Smith github:Michaelangel007 + Brad Weinberger Matvey Cherevko github:mosra + Luca Sas Alexander Veselov Zack Middleton [reserved] + Ryan C. Gordon [reserved] [reserved] + DO NOT ADD YOUR NAME HERE + + Jacko Dirks + + To add your name to the credits, pick a random blank space in the middle and fill it. + 80% of merge conflicts on stb PRs are due to people adding their name at the end + of the credits. */ #ifndef STBI_INCLUDE_STB_IMAGE_H @@ -105,10 +132,8 @@ RECENT REVISION HISTORY: // DOCUMENTATION // // Limitations: -// - no 16-bit-per-channel PNG // - no 12-bit-per-channel JPEG // - no JPEGs with arithmetic coding -// - no 1-bit BMP // - GIF always returns *comp=4 // // Basic usage (see HDR discussion below for HDR usage): @@ -118,7 +143,7 @@ RECENT REVISION HISTORY: // // ... x = width, y = height, n = # 8-bit components per pixel ... // // ... replace '0' with '1'..'4' to force that many components per pixel // // ... but 'n' will always be the number that it would have been if you said 0 -// stbi_image_free(data) +// stbi_image_free(data); // // Standard parameters: // int *x -- outputs image width in pixels @@ -157,6 +182,42 @@ RECENT REVISION HISTORY: // // Paletted PNG, BMP, GIF, and PIC images are automatically depalettized. // +// To query the width, height and component count of an image without having to +// decode the full file, you can use the stbi_info family of functions: +// +// int x,y,n,ok; +// ok = stbi_info(filename, &x, &y, &n); +// // returns ok=1 and sets x, y, n if image is a supported format, +// // 0 otherwise. +// +// Note that stb_image pervasively uses ints in its public API for sizes, +// including sizes of memory buffers. This is now part of the API and thus +// hard to change without causing breakage. As a result, the various image +// loaders all have certain limits on image size; these differ somewhat +// by format but generally boil down to either just under 2GB or just under +// 1GB. When the decoded image would be larger than this, stb_image decoding +// will fail. +// +// Additionally, stb_image will reject image files that have any of their +// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS, +// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit, +// the only way to have an image with such dimensions load correctly +// is for it to have a rather extreme aspect ratio. Either way, the +// assumption here is that such larger images are likely to be malformed +// or malicious. If you do need to load an image with individual dimensions +// larger than that, and it still fits in the overall size limit, you can +// #define STBI_MAX_DIMENSIONS on your own to be something larger. +// +// =========================================================================== +// +// UNICODE: +// +// If compiling for Windows and you wish to use Unicode filenames, compile +// with +// #define STBI_WINDOWS_UTF8 +// and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert +// Windows wchar_t filenames to utf8. +// // =========================================================================== // // Philosophy @@ -169,12 +230,12 @@ RECENT REVISION HISTORY: // // Sometimes I let "good performance" creep up in priority over "easy to maintain", // and for best performance I may provide less-easy-to-use APIs that give higher -// performance, in addition to the easy to use ones. Nevertheless, it's important +// performance, in addition to the easy-to-use ones. Nevertheless, it's important // to keep in mind that from the standpoint of you, a client of this library, // all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all. // // Some secondary priorities arise directly from the first two, some of which -// make more explicit reasons why performance can't be emphasized. +// provide more explicit reasons why performance can't be emphasized. // // - Portable ("ease of use") // - Small source code footprint ("easy to maintain") @@ -217,11 +278,10 @@ RECENT REVISION HISTORY: // // HDR image support (disable by defining STBI_NO_HDR) // -// stb_image now supports loading HDR images in general, and currently -// the Radiance .HDR file format, although the support is provided -// generically. You can still load any file through the existing interface; -// if you attempt to load an HDR file, it will be automatically remapped to -// LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1; +// stb_image supports loading HDR images in general, and currently the Radiance +// .HDR file format specifically. You can still load any file through the existing +// interface; if you attempt to load an HDR file, it will be automatically remapped +// to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1; // both of these constants can be reconfigured through this interface: // // stbi_hdr_to_ldr_gamma(2.2f); @@ -253,11 +313,10 @@ RECENT REVISION HISTORY: // // iPhone PNG support: // -// By default we convert iphone-formatted PNGs back to RGB, even though -// they are internally encoded differently. You can disable this conversion -// by by calling stbi_convert_iphone_png_to_rgb(0), in which case -// you will always just get the native iphone "format" through (which -// is BGR stored in RGB). +// We optionally support converting iPhone-formatted PNGs (which store +// premultiplied BGRA) back to RGB, even though they're internally encoded +// differently. To enable this conversion, call +// stbi_convert_iphone_png_to_rgb(1). // // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per // pixel to remove any premultiplied alpha *only* if the image file explicitly @@ -299,7 +358,14 @@ RECENT REVISION HISTORY: // - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still // want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB // - +// - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater +// than that size (in either width or height) without further processing. +// This is to let programs in the wild set an upper bound to prevent +// denial-of-service attacks on untrusted data, as one could generate a +// valid image of gigantic dimensions and force stb_image to allocate a +// huge block of memory and spend disproportionate time decoding it. By +// default this is set to (1 << 24), which is 16777216, but that's still +// very big. #ifndef STBI_NO_STDIO #include @@ -317,6 +383,7 @@ enum STBI_rgb_alpha = 4 }; +#include typedef unsigned char stbi_uc; typedef unsigned short stbi_us; @@ -324,11 +391,13 @@ typedef unsigned short stbi_us; extern "C" { #endif +#ifndef STBIDEF #ifdef STB_IMAGE_STATIC #define STBIDEF static #else #define STBIDEF extern #endif +#endif ////////////////////////////////////////////////////////////////////////////// // @@ -360,6 +429,14 @@ STBIDEF stbi_uc *stbi_load_from_file (FILE *f, int *x, int *y, int *channels_in // for stbi_load_from_file, file pointer is left pointing immediately after image #endif +#ifndef STBI_NO_GIF +STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp); +#endif + +#ifdef STBI_WINDOWS_UTF8 +STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input); +#endif + //////////////////////////////////// // // 16-bits-per-channel interface @@ -407,7 +484,7 @@ STBIDEF int stbi_is_hdr_from_file(FILE *f); // get a VERY brief reason for failure -// NOT THREADSAFE +// on most compilers (and ALL modern mainstream compilers) this is threadsafe STBIDEF const char *stbi_failure_reason (void); // free the loaded image -- this is just free() @@ -416,11 +493,14 @@ STBIDEF void stbi_image_free (void *retval_from_stbi_load); // get image dimensions & components without fully decoding STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp); STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp); +STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len); +STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user); #ifndef STBI_NO_STDIO -STBIDEF int stbi_info (char const *filename, int *x, int *y, int *comp); -STBIDEF int stbi_info_from_file (FILE *f, int *x, int *y, int *comp); - +STBIDEF int stbi_info (char const *filename, int *x, int *y, int *comp); +STBIDEF int stbi_info_from_file (FILE *f, int *x, int *y, int *comp); +STBIDEF int stbi_is_16_bit (char const *filename); +STBIDEF int stbi_is_16_bit_from_file(FILE *f); #endif @@ -437,6 +517,13 @@ STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert); // flip the image vertically, so the first pixel in the output array is the bottom left STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip); +// as above, but only applies to images loaded on the thread that calls the function +// this function is only available if your compiler supports thread-local variables; +// calling it will fail to link if your compiler doesn't +STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply); +STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert); +STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip); + // ZLIB client - used by PNG, available for other purposes STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen); @@ -504,7 +591,7 @@ STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const ch #include #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) -#include // ldexp +#include // ldexp, pow #endif #ifndef STBI_NO_STDIO @@ -516,6 +603,12 @@ STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const ch #define STBI_ASSERT(x) assert(x) #endif +#ifdef __cplusplus +#define STBI_EXTERN extern "C" +#else +#define STBI_EXTERN extern +#endif + #ifndef _MSC_VER #ifdef __cplusplus @@ -527,8 +620,25 @@ STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const ch #define stbi_inline __forceinline #endif +#ifndef STBI_NO_THREAD_LOCALS + #if defined(__cplusplus) && __cplusplus >= 201103L + #define STBI_THREAD_LOCAL thread_local + #elif defined(__GNUC__) && __GNUC__ < 5 + #define STBI_THREAD_LOCAL __thread + #elif defined(_MSC_VER) + #define STBI_THREAD_LOCAL __declspec(thread) + #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__) + #define STBI_THREAD_LOCAL _Thread_local + #endif -#ifdef _MSC_VER + #ifndef STBI_THREAD_LOCAL + #if defined(__GNUC__) + #define STBI_THREAD_LOCAL __thread + #endif + #endif +#endif + +#if defined(_MSC_VER) || defined(__SYMBIAN32__) typedef unsigned short stbi__uint16; typedef signed short stbi__int16; typedef unsigned int stbi__uint32; @@ -557,7 +667,7 @@ typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1]; #ifdef STBI_HAS_LROTL #define stbi_lrot(x,y) _lrotl(x,y) #else - #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (32 - (y)))) + #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (-(y) & 31))) #endif #if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED)) @@ -640,14 +750,18 @@ static int stbi__cpuid3(void) #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name +#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2) static int stbi__sse2_available(void) { int info3 = stbi__cpuid3(); return ((info3 >> 26) & 1) != 0; } +#endif + #else // assume GCC-style if not VC++ #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16))) +#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2) static int stbi__sse2_available(void) { // If we're even attempting to compile this on GCC/Clang, that means @@ -655,6 +769,8 @@ static int stbi__sse2_available(void) // instructions at will, and so are we. return 1; } +#endif + #endif #endif @@ -665,14 +781,21 @@ static int stbi__sse2_available(void) #ifdef STBI_NEON #include -// assume GCC or Clang on ARM targets +#ifdef _MSC_VER +#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name +#else #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16))) #endif +#endif #ifndef STBI_SIMD_ALIGN #define STBI_SIMD_ALIGN(type, name) type name #endif +#ifndef STBI_MAX_DIMENSIONS +#define STBI_MAX_DIMENSIONS (1 << 24) +#endif + /////////////////////////////////////////////// // // stbi__context struct and start_xxx functions @@ -690,6 +813,7 @@ typedef struct int read_from_callbacks; int buflen; stbi_uc buffer_start[128]; + int callback_already_read; stbi_uc *img_buffer, *img_buffer_end; stbi_uc *img_buffer_original, *img_buffer_original_end; @@ -703,6 +827,7 @@ static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len) { s->io.read = NULL; s->read_from_callbacks = 0; + s->callback_already_read = 0; s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer; s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len; } @@ -714,7 +839,8 @@ static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void * s->io_user_data = user; s->buflen = sizeof(s->buffer_start); s->read_from_callbacks = 1; - s->img_buffer_original = s->buffer_start; + s->callback_already_read = 0; + s->img_buffer = s->img_buffer_original = s->buffer_start; stbi__refill_buffer(s); s->img_buffer_original_end = s->img_buffer_end; } @@ -728,12 +854,17 @@ static int stbi__stdio_read(void *user, char *data, int size) static void stbi__stdio_skip(void *user, int n) { + int ch; fseek((FILE*) user, n, SEEK_CUR); + ch = fgetc((FILE*) user); /* have to read a byte to reset feof()'s flag */ + if (ch != EOF) { + ungetc(ch, (FILE *) user); /* push byte back onto stream if valid. */ + } } static int stbi__stdio_eof(void *user) { - return feof((FILE*) user); + return feof((FILE*) user) || ferror((FILE *) user); } static stbi_io_callbacks stbi__stdio_callbacks = @@ -784,6 +915,7 @@ static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp); static int stbi__png_test(stbi__context *s); static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri); static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp); +static int stbi__png_is16(stbi__context *s); #endif #ifndef STBI_NO_BMP @@ -802,6 +934,7 @@ static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp); static int stbi__psd_test(stbi__context *s); static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc); static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp); +static int stbi__psd_is16(stbi__context *s); #endif #ifndef STBI_NO_HDR @@ -819,6 +952,7 @@ static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp); #ifndef STBI_NO_GIF static int stbi__gif_test(stbi__context *s); static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri); +static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp); static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp); #endif @@ -826,21 +960,27 @@ static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp); static int stbi__pnm_test(stbi__context *s); static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri); static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp); +static int stbi__pnm_is16(stbi__context *s); #endif -// this is not threadsafe -static const char *stbi__g_failure_reason; +static +#ifdef STBI_THREAD_LOCAL +STBI_THREAD_LOCAL +#endif +const char *stbi__g_failure_reason; STBIDEF const char *stbi_failure_reason(void) { return stbi__g_failure_reason; } +#ifndef STBI_NO_FAILURE_STRINGS static int stbi__err(const char *str) { stbi__g_failure_reason = str; return 0; } +#endif static void *stbi__malloc(size_t size) { @@ -879,11 +1019,13 @@ static int stbi__mul2sizes_valid(int a, int b) return a <= INT_MAX/b; } +#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR) // returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow static int stbi__mad2sizes_valid(int a, int b, int add) { return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add); } +#endif // returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow static int stbi__mad3sizes_valid(int a, int b, int c, int add) @@ -893,18 +1035,22 @@ static int stbi__mad3sizes_valid(int a, int b, int c, int add) } // returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow +#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM) static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add) { return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) && stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add); } +#endif +#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR) // mallocs with size overflow checking static void *stbi__malloc_mad2(int a, int b, int add) { if (!stbi__mad2sizes_valid(a, b, add)) return NULL; return stbi__malloc(a*b + add); } +#endif static void *stbi__malloc_mad3(int a, int b, int c, int add) { @@ -912,11 +1058,30 @@ static void *stbi__malloc_mad3(int a, int b, int c, int add) return stbi__malloc(a*b*c + add); } +#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM) static void *stbi__malloc_mad4(int a, int b, int c, int d, int add) { if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL; return stbi__malloc(a*b*c*d + add); } +#endif + +// returns 1 if the sum of two signed ints is valid (between -2^31 and 2^31-1 inclusive), 0 on overflow. +static int stbi__addints_valid(int a, int b) +{ + if ((a >= 0) != (b >= 0)) return 1; // a and b have different signs, so no overflow + if (a < 0 && b < 0) return a >= INT_MIN - b; // same as a + b >= INT_MIN; INT_MIN - b cannot overflow since b < 0. + return a <= INT_MAX - b; +} + +// returns 1 if the product of two ints fits in a signed short, 0 on overflow. +static int stbi__mul2shorts_valid(int a, int b) +{ + if (b == 0 || b == -1) return 1; // multiplication by 0 is always 0; check for -1 so SHRT_MIN/b doesn't overflow + if ((a >= 0) == (b >= 0)) return a <= SHRT_MAX/b; // product is positive, so similar to mul2sizes_valid + if (b < 0) return a <= SHRT_MIN / b; // same as a * b >= SHRT_MIN + return a >= SHRT_MIN / b; +} // stbi__err - error // stbi__errpf - error returning pointer to float @@ -946,13 +1111,29 @@ static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp); static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp); #endif -static int stbi__vertically_flip_on_load = 0; +static int stbi__vertically_flip_on_load_global = 0; STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip) { - stbi__vertically_flip_on_load = flag_true_if_should_flip; + stbi__vertically_flip_on_load_global = flag_true_if_should_flip; } +#ifndef STBI_THREAD_LOCAL +#define stbi__vertically_flip_on_load stbi__vertically_flip_on_load_global +#else +static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set; + +STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip) +{ + stbi__vertically_flip_on_load_local = flag_true_if_should_flip; + stbi__vertically_flip_on_load_set = 1; +} + +#define stbi__vertically_flip_on_load (stbi__vertically_flip_on_load_set \ + ? stbi__vertically_flip_on_load_local \ + : stbi__vertically_flip_on_load_global) +#endif // STBI_THREAD_LOCAL + static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc) { memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields @@ -960,9 +1141,8 @@ static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int re ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order ri->num_channels = 0; - #ifndef STBI_NO_JPEG - if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri); - #endif + // test the formats with a very explicit header first (at least a FOURCC + // or distinctive magic number first) #ifndef STBI_NO_PNG if (stbi__png_test(s)) return stbi__png_load(s,x,y,comp,req_comp, ri); #endif @@ -974,10 +1154,19 @@ static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int re #endif #ifndef STBI_NO_PSD if (stbi__psd_test(s)) return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc); + #else + STBI_NOTUSED(bpc); #endif #ifndef STBI_NO_PIC if (stbi__pic_test(s)) return stbi__pic_load(s,x,y,comp,req_comp, ri); #endif + + // then the formats that can end up attempting to load with just 1 or 2 + // bytes matching expectations; these are prone to false positives, so + // try them later + #ifndef STBI_NO_JPEG + if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri); + #endif #ifndef STBI_NO_PNM if (stbi__pnm_test(s)) return stbi__pnm_load(s,x,y,comp,req_comp, ri); #endif @@ -1054,6 +1243,20 @@ static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel) } } +#ifndef STBI_NO_GIF +static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel) +{ + int slice; + int slice_size = w * h * bytes_per_pixel; + + stbi_uc *bytes = (stbi_uc *)image; + for (slice = 0; slice < z; ++slice) { + stbi__vertical_flip(bytes, w, h, bytes_per_pixel); + bytes += slice_size; + } +} +#endif + static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp) { stbi__result_info ri; @@ -1062,8 +1265,10 @@ static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, if (result == NULL) return NULL; + // it is the responsibility of the loaders to make sure we get either 8 or 16 bit. + STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16); + if (ri.bits_per_channel != 8) { - STBI_ASSERT(ri.bits_per_channel == 16); result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp); ri.bits_per_channel = 8; } @@ -1086,8 +1291,10 @@ static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, if (result == NULL) return NULL; + // it is the responsibility of the loaders to make sure we get either 8 or 16 bit. + STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16); + if (ri.bits_per_channel != 16) { - STBI_ASSERT(ri.bits_per_channel == 8); result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp); ri.bits_per_channel = 16; } @@ -1103,7 +1310,7 @@ static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, return (stbi__uint16 *) result; } -#ifndef STBI_NO_HDR +#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR) static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp) { if (stbi__vertically_flip_on_load && result != NULL) { @@ -1115,10 +1322,38 @@ static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, in #ifndef STBI_NO_STDIO +#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8) +STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide); +STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default); +#endif + +#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8) +STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input) +{ + return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL); +} +#endif + static FILE *stbi__fopen(char const *filename, char const *mode) { FILE *f; +#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8) + wchar_t wMode[64]; + wchar_t wFilename[1024]; + if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename))) + return 0; + + if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode))) + return 0; + #if defined(_MSC_VER) && _MSC_VER >= 1400 + if (0 != _wfopen_s(&f, wFilename, wMode)) + f = 0; +#else + f = _wfopen(wFilename, wMode); +#endif + +#elif defined(_MSC_VER) && _MSC_VER >= 1400 if (0 != fopen_s(&f, filename, mode)) f=0; #else @@ -1205,6 +1440,22 @@ STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *u return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp); } +#ifndef STBI_NO_GIF +STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp) +{ + unsigned char *result; + stbi__context s; + stbi__start_mem(&s,buffer,len); + + result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp); + if (stbi__vertically_flip_on_load) { + stbi__vertical_flip_slices( result, *x, *y, *z, *comp ); + } + + return result; +} +#endif + #ifndef STBI_NO_LINEAR static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) { @@ -1288,12 +1539,16 @@ STBIDEF int stbi_is_hdr (char const *filename) return result; } -STBIDEF int stbi_is_hdr_from_file(FILE *f) +STBIDEF int stbi_is_hdr_from_file(FILE *f) { #ifndef STBI_NO_HDR + long pos = ftell(f); + int res; stbi__context s; stbi__start_file(&s,f); - return stbi__hdr_test(&s); + res = stbi__hdr_test(&s); + fseek(f, pos, SEEK_SET); + return res; #else STBI_NOTUSED(f); return 0; @@ -1342,6 +1597,7 @@ enum static void stbi__refill_buffer(stbi__context *s) { int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen); + s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original); if (n == 0) { // at end of file, treat same as if from memory, but need to handle case // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file @@ -1366,6 +1622,9 @@ stbi_inline static stbi_uc stbi__get8(stbi__context *s) return 0; } +#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM) +// nothing +#else stbi_inline static int stbi__at_eof(stbi__context *s) { if (s->io.read) { @@ -1377,9 +1636,14 @@ stbi_inline static int stbi__at_eof(stbi__context *s) return s->img_buffer >= s->img_buffer_end; } +#endif +#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) +// nothing +#else static void stbi__skip(stbi__context *s, int n) { + if (n == 0) return; // already there! if (n < 0) { s->img_buffer = s->img_buffer_end; return; @@ -1394,7 +1658,11 @@ static void stbi__skip(stbi__context *s, int n) } s->img_buffer += n; } +#endif +#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM) +// nothing +#else static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n) { if (s->io.read) { @@ -1418,18 +1686,27 @@ static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n) } else return 0; } +#endif +#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC) +// nothing +#else static int stbi__get16be(stbi__context *s) { int z = stbi__get8(s); return (z << 8) + stbi__get8(s); } +#endif +#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC) +// nothing +#else static stbi__uint32 stbi__get32be(stbi__context *s) { stbi__uint32 z = stbi__get16be(s); return (z << 16) + stbi__get16be(s); } +#endif #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) // nothing @@ -1445,13 +1722,16 @@ static int stbi__get16le(stbi__context *s) static stbi__uint32 stbi__get32le(stbi__context *s) { stbi__uint32 z = stbi__get16le(s); - return z + (stbi__get16le(s) << 16); + z += (stbi__uint32)stbi__get16le(s) << 16; + return z; } #endif #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings - +#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM) +// nothing +#else ////////////////////////////////////////////////////////////////////////////// // // generic converter from built-in img_n to req_comp @@ -1467,7 +1747,11 @@ static stbi_uc stbi__compute_y(int r, int g, int b) { return (stbi_uc) (((r*77) + (g*150) + (29*b)) >> 8); } +#endif +#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM) +// nothing +#else static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y) { int i,j; @@ -1491,19 +1775,19 @@ static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int r // convert source image with img_n components to one with req_comp components; // avoid switch per pixel, so use switch per scanline and massive macros switch (STBI__COMBO(img_n, req_comp)) { - STBI__CASE(1,2) { dest[0]=src[0], dest[1]=255; } break; + STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255; } break; STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0]; } break; - STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=255; } break; + STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255; } break; STBI__CASE(2,1) { dest[0]=src[0]; } break; STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0]; } break; - STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; } break; - STBI__CASE(3,4) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=255; } break; + STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1]; } break; + STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255; } break; STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); } break; - STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = 255; } break; + STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255; } break; STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); } break; - STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]), dest[1] = src[3]; } break; - STBI__CASE(4,3) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; } break; - default: STBI_ASSERT(0); + STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break; + STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2]; } break; + default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion"); } #undef STBI__CASE } @@ -1511,12 +1795,20 @@ static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int r STBI_FREE(data); return good; } +#endif +#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) +// nothing +#else static stbi__uint16 stbi__compute_y_16(int r, int g, int b) { return (stbi__uint16) (((r*77) + (g*150) + (29*b)) >> 8); } +#endif +#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) +// nothing +#else static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y) { int i,j; @@ -1540,19 +1832,19 @@ static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int r // convert source image with img_n components to one with req_comp components; // avoid switch per pixel, so use switch per scanline and massive macros switch (STBI__COMBO(img_n, req_comp)) { - STBI__CASE(1,2) { dest[0]=src[0], dest[1]=0xffff; } break; + STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff; } break; STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0]; } break; - STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=0xffff; } break; + STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff; } break; STBI__CASE(2,1) { dest[0]=src[0]; } break; STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0]; } break; - STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0], dest[3]=src[1]; } break; - STBI__CASE(3,4) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2],dest[3]=0xffff; } break; + STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1]; } break; + STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff; } break; STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); } break; - STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]), dest[1] = 0xffff; } break; + STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break; STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); } break; - STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]), dest[1] = src[3]; } break; - STBI__CASE(4,3) { dest[0]=src[0],dest[1]=src[1],dest[2]=src[2]; } break; - default: STBI_ASSERT(0); + STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break; + STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2]; } break; + default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion"); } #undef STBI__CASE } @@ -1560,6 +1852,7 @@ static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int r STBI_FREE(data); return good; } +#endif #ifndef STBI_NO_LINEAR static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp) @@ -1575,7 +1868,11 @@ static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp) for (k=0; k < n; ++k) { output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale); } - if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f; + } + if (n < comp) { + for (i=0; i < x*y; ++i) { + output[i*comp + n] = data[i*comp + n]/255.0f; + } } STBI_FREE(data); return output; @@ -1705,11 +2002,15 @@ typedef struct static int stbi__build_huffman(stbi__huffman *h, int *count) { - int i,j,k=0,code; + int i,j,k=0; + unsigned int code; // build size list for each symbol (from JPEG spec) - for (i=0; i < 16; ++i) - for (j=0; j < count[i]; ++j) + for (i=0; i < 16; ++i) { + for (j=0; j < count[i]; ++j) { h->size[k++] = (stbi_uc) (i+1); + if(k >= 257) return stbi__err("bad size list","Corrupt JPEG"); + } + } h->size[k] = 0; // compute actual symbols (from jpeg spec) @@ -1721,7 +2022,7 @@ static int stbi__build_huffman(stbi__huffman *h, int *count) if (h->size[k] == j) { while (h->size[k] == j) h->code[k++] = (stbi__uint16) (code++); - if (code-1 >= (1 << j)) return stbi__err("bad code lengths","Corrupt JPEG"); + if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG"); } // compute largest code + 1 for this size, preshifted as needed later h->maxcode[j] = code << (16-j); @@ -1765,7 +2066,7 @@ static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h) if (k < m) k += (~0U << magbits) + 1; // if the result is small enough, we can fit it in fast_ac table if (k >= -128 && k <= 127) - fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits)); + fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits)); } } } @@ -1774,7 +2075,7 @@ static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h) static void stbi__grow_buffer_unsafe(stbi__jpeg *j) { do { - int b = j->nomore ? 0 : stbi__get8(j->s); + unsigned int b = j->nomore ? 0 : stbi__get8(j->s); if (b == 0xff) { int c = stbi__get8(j->s); while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes @@ -1790,7 +2091,7 @@ static void stbi__grow_buffer_unsafe(stbi__jpeg *j) } // (1 << n) - 1 -static stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535}; +static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535}; // decode a jpeg huffman value from the bitstream stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h) @@ -1834,6 +2135,8 @@ stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h) // convert the huffman code to the symbol id c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k]; + if(c < 0 || c >= 256) // symbol id out of bounds! + return -1; STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]); // convert the id to a symbol @@ -1843,7 +2146,7 @@ stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h) } // bias[n] = (-1<code_bits < n) stbi__grow_buffer_unsafe(j); + if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing - sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB + sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative) k = stbi_lrot(j->code_buffer, n); - STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask))); j->code_buffer = k & ~stbi__bmask[n]; k &= stbi__bmask[n]; j->code_bits -= n; - return k + (stbi__jbias[n] & ~sgn); + return k + (stbi__jbias[n] & (sgn - 1)); } // get some unsigned bits @@ -1867,6 +2170,7 @@ stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n) { unsigned int k; if (j->code_bits < n) stbi__grow_buffer_unsafe(j); + if (j->code_bits < n) return 0; // ran out of bits from stream, return 0s intead of continuing k = stbi_lrot(j->code_buffer, n); j->code_buffer = k & ~stbi__bmask[n]; k &= stbi__bmask[n]; @@ -1878,6 +2182,7 @@ stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j) { unsigned int k; if (j->code_bits < 1) stbi__grow_buffer_unsafe(j); + if (j->code_bits < 1) return 0; // ran out of bits from stream, return 0s intead of continuing k = j->code_buffer; j->code_buffer <<= 1; --j->code_bits; @@ -1886,7 +2191,7 @@ stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j) // given a value that's at position X in the zigzag stream, // where does it appear in the 8x8 matrix coded as row-major? -static stbi_uc stbi__jpeg_dezigzag[64+15] = +static const stbi_uc stbi__jpeg_dezigzag[64+15] = { 0, 1, 8, 16, 9, 2, 3, 10, 17, 24, 32, 25, 18, 11, 4, 5, @@ -1909,14 +2214,16 @@ static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman if (j->code_bits < 16) stbi__grow_buffer_unsafe(j); t = stbi__jpeg_huff_decode(j, hdc); - if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG"); + if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG"); // 0 all the ac values now so we can do it 32-bits at a time memset(data,0,64*sizeof(data[0])); diff = t ? stbi__extend_receive(j, t) : 0; + if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta","Corrupt JPEG"); dc = j->img_comp[b].dc_pred + diff; j->img_comp[b].dc_pred = dc; + if (!stbi__mul2shorts_valid(dc, dequant[0])) return stbi__err("can't merge dc and ac", "Corrupt JPEG"); data[0] = (short) (dc * dequant[0]); // decode AC components, see JPEG spec @@ -1930,6 +2237,7 @@ static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman if (r) { // fast-AC path k += (r >> 4) & 15; // run s = r & 15; // combined length + if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available"); j->code_buffer <<= s; j->code_bits -= s; // decode into unzigzag'd location @@ -1966,11 +2274,14 @@ static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__ // first scan for DC coefficient, must be first memset(data,0,64*sizeof(data[0])); // 0 all the ac values now t = stbi__jpeg_huff_decode(j, hdc); + if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG"); diff = t ? stbi__extend_receive(j, t) : 0; + if (!stbi__addints_valid(j->img_comp[b].dc_pred, diff)) return stbi__err("bad delta", "Corrupt JPEG"); dc = j->img_comp[b].dc_pred + diff; j->img_comp[b].dc_pred = dc; - data[0] = (short) (dc << j->succ_low); + if (!stbi__mul2shorts_valid(dc, 1 << j->succ_low)) return stbi__err("can't merge dc and ac", "Corrupt JPEG"); + data[0] = (short) (dc * (1 << j->succ_low)); } else { // refinement scan for DC coefficient if (stbi__jpeg_get_bit(j)) @@ -2004,10 +2315,11 @@ static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__ if (r) { // fast-AC path k += (r >> 4) & 15; // run s = r & 15; // combined length + if (s > j->code_bits) return stbi__err("bad huffman code", "Combined length longer than code bits available"); j->code_buffer <<= s; j->code_bits -= s; zig = stbi__jpeg_dezigzag[k++]; - data[zig] = (short) ((r >> 8) << shift); + data[zig] = (short) ((r >> 8) * (1 << shift)); } else { int rs = stbi__jpeg_huff_decode(j, hac); if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG"); @@ -2025,7 +2337,7 @@ static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__ } else { k += r; zig = stbi__jpeg_dezigzag[k++]; - data[zig] = (short) (stbi__extend_receive(j,s) << shift); + data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift)); } } } while (k <= j->spec_end); @@ -2112,7 +2424,7 @@ stbi_inline static stbi_uc stbi__clamp(int x) } #define stbi__f2f(x) ((int) (((x) * 4096 + 0.5))) -#define stbi__fsh(x) ((x) << 12) +#define stbi__fsh(x) ((x) * 4096) // derived from jidctint -- DCT_ISLOW #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \ @@ -2167,7 +2479,7 @@ static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64]) // (1|2|3|4|5|6|7)==0 0 seconds // all separate -0.047 seconds // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds - int dcterm = d[0] << 2; + int dcterm = d[0]*4; v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm; } else { STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56]) @@ -2824,6 +3136,7 @@ static int stbi__process_marker(stbi__jpeg *z, int m) sizes[i] = stbi__get8(z->s); n += sizes[i]; } + if(n > 256) return stbi__err("bad DHT header","Corrupt JPEG"); // Loop over i < n would write past end of values! L -= 17; if (tc == 0) { if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0; @@ -2956,6 +3269,8 @@ static int stbi__process_frame_header(stbi__jpeg *z, int scan) p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline s->img_y = stbi__get16be(s); if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG s->img_x = stbi__get16be(s); if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires + if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)"); + if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)"); c = stbi__get8(s); if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG"); s->img_n = c; @@ -2968,7 +3283,7 @@ static int stbi__process_frame_header(stbi__jpeg *z, int scan) z->rgb = 0; for (i=0; i < s->img_n; ++i) { - static unsigned char rgb[3] = { 'R', 'G', 'B' }; + static const unsigned char rgb[3] = { 'R', 'G', 'B' }; z->img_comp[i].id = stbi__get8(s); if (s->img_n == 3 && z->img_comp[i].id == rgb[i]) ++z->rgb; @@ -2987,6 +3302,13 @@ static int stbi__process_frame_header(stbi__jpeg *z, int scan) if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v; } + // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios + // and I've never seen a non-corrupted JPEG file actually use them + for (i=0; i < s->img_n; ++i) { + if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG"); + if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG"); + } + // compute interleaved mcu info z->img_h_max = h_max; z->img_v_max = v_max; @@ -3064,6 +3386,28 @@ static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan) return 1; } +static stbi_uc stbi__skip_jpeg_junk_at_end(stbi__jpeg *j) +{ + // some JPEGs have junk at end, skip over it but if we find what looks + // like a valid marker, resume there + while (!stbi__at_eof(j->s)) { + stbi_uc x = stbi__get8(j->s); + while (x == 0xff) { // might be a marker + if (stbi__at_eof(j->s)) return STBI__MARKER_none; + x = stbi__get8(j->s); + if (x != 0x00 && x != 0xff) { + // not a stuffed zero or lead-in to another marker, looks + // like an actual marker, return it + return x; + } + // stuffed zero has x=0 now which ends the loop, meaning we go + // back to regular scan loop. + // repeated 0xff keeps trying to read the next byte of the marker. + } + } + return STBI__MARKER_none; +} + // decode image to YCbCr format static int stbi__decode_jpeg_image(stbi__jpeg *j) { @@ -3080,25 +3424,22 @@ static int stbi__decode_jpeg_image(stbi__jpeg *j) if (!stbi__process_scan_header(j)) return 0; if (!stbi__parse_entropy_coded_data(j)) return 0; if (j->marker == STBI__MARKER_none ) { - // handle 0s at the end of image data from IP Kamera 9060 - while (!stbi__at_eof(j->s)) { - int x = stbi__get8(j->s); - if (x == 255) { - j->marker = stbi__get8(j->s); - break; - } - } + j->marker = stbi__skip_jpeg_junk_at_end(j); // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0 } + m = stbi__get_marker(j); + if (STBI__RESTART(m)) + m = stbi__get_marker(j); } else if (stbi__DNL(m)) { int Ld = stbi__get16be(j->s); stbi__uint32 NL = stbi__get16be(j->s); - if (Ld != 4) stbi__err("bad DNL len", "Corrupt JPEG"); - if (NL != j->s->img_y) stbi__err("bad DNL height", "Corrupt JPEG"); + if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG"); + if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG"); + m = stbi__get_marker(j); } else { - if (!stbi__process_marker(j, m)) return 0; + if (!stbi__process_marker(j, m)) return 1; + m = stbi__get_marker(j); } - m = stbi__get_marker(j); } if (j->progressive) stbi__jpeg_finish(j); @@ -3542,12 +3883,16 @@ static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp else decode_n = z->s->img_n; + // nothing to do if no components requested; check this now to avoid + // accessing uninitialized coutput[0] later + if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; } + // resample and color-convert { int k; unsigned int i,j; stbi_uc *output; - stbi_uc *coutput[4]; + stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL }; stbi__resample res_comp[4]; @@ -3668,7 +4013,7 @@ static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp if (n == 1) for (i=0; i < z->s->img_x; ++i) out[i] = y[i]; else - for (i=0; i < z->s->img_x; ++i) *out++ = y[i], *out++ = 255; + for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; } } } } @@ -3684,6 +4029,8 @@ static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int re { unsigned char* result; stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg)); + if (!j) return stbi__errpuc("outofmem", "Out of memory"); + memset(j, 0, sizeof(stbi__jpeg)); STBI_NOTUSED(ri); j->s = s; stbi__setup_jpeg(j); @@ -3696,6 +4043,8 @@ static int stbi__jpeg_test(stbi__context *s) { int r; stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg)); + if (!j) return stbi__err("outofmem", "Out of memory"); + memset(j, 0, sizeof(stbi__jpeg)); j->s = s; stbi__setup_jpeg(j); r = stbi__decode_jpeg_header(j, STBI__SCAN_type); @@ -3720,6 +4069,8 @@ static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp) { int result; stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg))); + if (!j) return stbi__err("outofmem", "Out of memory"); + memset(j, 0, sizeof(stbi__jpeg)); j->s = s; result = stbi__jpeg_info_raw(j, x, y, comp); STBI_FREE(j); @@ -3739,6 +4090,7 @@ static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp) // fast-way is faster to check than jpeg huffman, but slow way is slower #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables #define STBI__ZFAST_MASK ((1 << STBI__ZFAST_BITS) - 1) +#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet // zlib-style huffman encoding // (jpegs packs from left, zlib from right, so can't share code) @@ -3748,8 +4100,8 @@ typedef struct stbi__uint16 firstcode[16]; int maxcode[17]; stbi__uint16 firstsymbol[16]; - stbi_uc size[288]; - stbi__uint16 value[288]; + stbi_uc size[STBI__ZNSYMS]; + stbi__uint16 value[STBI__ZNSYMS]; } stbi__zhuffman; stbi_inline static int stbi__bitreverse16(int n) @@ -3826,6 +4178,7 @@ typedef struct { stbi_uc *zbuffer, *zbuffer_end; int num_bits; + int hit_zeof_once; stbi__uint32 code_buffer; char *zout; @@ -3836,16 +4189,23 @@ typedef struct stbi__zhuffman z_length, z_distance; } stbi__zbuf; +stbi_inline static int stbi__zeof(stbi__zbuf *z) +{ + return (z->zbuffer >= z->zbuffer_end); +} + stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z) { - if (z->zbuffer >= z->zbuffer_end) return 0; - return *z->zbuffer++; + return stbi__zeof(z) ? 0 : *z->zbuffer++; } static void stbi__fill_bits(stbi__zbuf *z) { do { - STBI_ASSERT(z->code_buffer < (1U << z->num_bits)); + if (z->code_buffer >= (1U << z->num_bits)) { + z->zbuffer = z->zbuffer_end; /* treat this as EOF so we fail. */ + return; + } z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits; z->num_bits += 8; } while (z->num_bits <= 24); @@ -3870,10 +4230,11 @@ static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z) for (s=STBI__ZFAST_BITS+1; ; ++s) if (k < z->maxcode[s]) break; - if (s == 16) return -1; // invalid code! + if (s >= 16) return -1; // invalid code! // code size is s, so: b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s]; - STBI_ASSERT(z->size[b] == s); + if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere! + if (z->size[b] != s) return -1; // was originally an assert, but report failure instead. a->code_buffer >>= s; a->num_bits -= s; return z->value[b]; @@ -3882,7 +4243,23 @@ static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z) stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z) { int b,s; - if (a->num_bits < 16) stbi__fill_bits(a); + if (a->num_bits < 16) { + if (stbi__zeof(a)) { + if (!a->hit_zeof_once) { + // This is the first time we hit eof, insert 16 extra padding btis + // to allow us to keep going; if we actually consume any of them + // though, that is invalid data. This is caught later. + a->hit_zeof_once = 1; + a->num_bits += 16; // add 16 implicit zero bits + } else { + // We already inserted our extra 16 padding bits and are again + // out, this stream is actually prematurely terminated. + return -1; + } + } else { + stbi__fill_bits(a); + } + } b = z->fast[a->code_buffer & STBI__ZFAST_MASK]; if (b) { s = b >> 9; @@ -3896,13 +4273,16 @@ stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z) static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes { char *q; - int cur, limit, old_limit; + unsigned int cur, limit, old_limit; z->zout = zout; if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG"); - cur = (int) (z->zout - z->zout_start); - limit = old_limit = (int) (z->zout_end - z->zout_start); - while (cur + n > limit) + cur = (unsigned int) (z->zout - z->zout_start); + limit = old_limit = (unsigned) (z->zout_end - z->zout_start); + if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory"); + while (cur + n > limit) { + if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory"); limit *= 2; + } q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit); STBI_NOTUSED(old_limit); if (q == NULL) return stbi__err("outofmem", "Out of memory"); @@ -3912,18 +4292,18 @@ static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room return 1; } -static int stbi__zlength_base[31] = { +static const int stbi__zlength_base[31] = { 3,4,5,6,7,8,9,10,11,13, 15,17,19,23,27,31,35,43,51,59, 67,83,99,115,131,163,195,227,258,0,0 }; -static int stbi__zlength_extra[31]= +static const int stbi__zlength_extra[31]= { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 }; -static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193, +static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193, 257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0}; -static int stbi__zdist_extra[32] = +static const int stbi__zdist_extra[32] = { 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13}; static int stbi__parse_huffman_block(stbi__zbuf *a) @@ -3943,17 +4323,25 @@ static int stbi__parse_huffman_block(stbi__zbuf *a) int len,dist; if (z == 256) { a->zout = zout; + if (a->hit_zeof_once && a->num_bits < 16) { + // The first time we hit zeof, we inserted 16 extra zero bits into our bit + // buffer so the decoder can just do its speculative decoding. But if we + // actually consumed any of those bits (which is the case when num_bits < 16), + // the stream actually read past the end so it is malformed. + return stbi__err("unexpected end","Corrupt PNG"); + } return 1; } + if (z >= 286) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, length codes 286 and 287 must not appear in compressed data z -= 257; len = stbi__zlength_base[z]; if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]); z = stbi__zhuffman_decode(a, &a->z_distance); - if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); + if (z < 0 || z >= 30) return stbi__err("bad huffman code","Corrupt PNG"); // per DEFLATE, distance codes 30 and 31 must not appear in compressed data dist = stbi__zdist_base[z]; if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]); if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG"); - if (zout + len > a->zout_end) { + if (len > a->zout_end - zout) { if (!stbi__zexpand(a, zout, len)) return 0; zout = a->zout; } @@ -3970,7 +4358,7 @@ static int stbi__parse_huffman_block(stbi__zbuf *a) static int stbi__compute_huffman_codes(stbi__zbuf *a) { - static stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 }; + static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 }; stbi__zhuffman z_codelength; stbi_uc lencodes[286+32+137];//padding for maximum single op stbi_uc codelength_sizes[19]; @@ -4000,11 +4388,12 @@ static int stbi__compute_huffman_codes(stbi__zbuf *a) c = stbi__zreceive(a,2)+3; if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG"); fill = lencodes[n-1]; - } else if (c == 17) + } else if (c == 17) { c = stbi__zreceive(a,3)+3; - else { - STBI_ASSERT(c == 18); + } else if (c == 18) { c = stbi__zreceive(a,7)+11; + } else { + return stbi__err("bad codelengths", "Corrupt PNG"); } if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG"); memset(lencodes+n, fill, c); @@ -4030,7 +4419,7 @@ static int stbi__parse_uncompressed_block(stbi__zbuf *a) a->code_buffer >>= 8; a->num_bits -= 8; } - STBI_ASSERT(a->num_bits == 0); + if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG"); // now fill header the normal way while (k < 4) header[k++] = stbi__zget8(a); @@ -4052,6 +4441,7 @@ static int stbi__parse_zlib_header(stbi__zbuf *a) int cm = cmf & 15; /* int cinfo = cmf >> 4; */ int flg = stbi__zget8(a); + if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png @@ -4059,7 +4449,7 @@ static int stbi__parse_zlib_header(stbi__zbuf *a) return 1; } -static const stbi_uc stbi__zdefault_length[288] = +static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] = { 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, @@ -4095,6 +4485,7 @@ static int stbi__parse_zlib(stbi__zbuf *a, int parse_header) if (!stbi__parse_zlib_header(a)) return 0; a->num_bits = 0; a->code_buffer = 0; + a->hit_zeof_once = 0; do { final = stbi__zreceive(a,1); type = stbi__zreceive(a,2); @@ -4105,7 +4496,7 @@ static int stbi__parse_zlib(stbi__zbuf *a, int parse_header) } else { if (type == 1) { // use fixed code lengths - if (!stbi__zbuild_huffman(&a->z_length , stbi__zdefault_length , 288)) return 0; + if (!stbi__zbuild_huffman(&a->z_length , stbi__zdefault_length , STBI__ZNSYMS)) return 0; if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance, 32)) return 0; } else { if (!stbi__compute_huffman_codes(a)) return 0; @@ -4229,7 +4620,7 @@ static stbi__pngchunk stbi__get_chunk_header(stbi__context *s) static int stbi__check_png_header(stbi__context *s) { - static stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 }; + static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 }; int i; for (i=0; i < 8; ++i) if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG"); @@ -4250,9 +4641,8 @@ enum { STBI__F_up=2, STBI__F_avg=3, STBI__F_paeth=4, - // synthetic filters used for first scanline to avoid needing a dummy row of 0s - STBI__F_avg_first, - STBI__F_paeth_first + // synthetic filter used for first scanline to avoid needing a dummy row of 0s + STBI__F_avg_first }; static stbi_uc first_row_filter[5] = @@ -4261,29 +4651,56 @@ static stbi_uc first_row_filter[5] = STBI__F_sub, STBI__F_none, STBI__F_avg_first, - STBI__F_paeth_first + STBI__F_sub // Paeth with b=c=0 turns out to be equivalent to sub }; static int stbi__paeth(int a, int b, int c) { - int p = a + b - c; - int pa = abs(p-a); - int pb = abs(p-b); - int pc = abs(p-c); - if (pa <= pb && pa <= pc) return a; - if (pb <= pc) return b; - return c; + // This formulation looks very different from the reference in the PNG spec, but is + // actually equivalent and has favorable data dependencies and admits straightforward + // generation of branch-free code, which helps performance significantly. + int thresh = c*3 - (a + b); + int lo = a < b ? a : b; + int hi = a < b ? b : a; + int t0 = (hi <= thresh) ? lo : c; + int t1 = (thresh <= lo) ? hi : t0; + return t1; } -static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 }; +static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 }; + +// adds an extra all-255 alpha channel +// dest == src is legal +// img_n must be 1 or 3 +static void stbi__create_png_alpha_expand8(stbi_uc *dest, stbi_uc *src, stbi__uint32 x, int img_n) +{ + int i; + // must process data backwards since we allow dest==src + if (img_n == 1) { + for (i=x-1; i >= 0; --i) { + dest[i*2+1] = 255; + dest[i*2+0] = src[i]; + } + } else { + STBI_ASSERT(img_n == 3); + for (i=x-1; i >= 0; --i) { + dest[i*4+3] = 255; + dest[i*4+2] = src[i*3+2]; + dest[i*4+1] = src[i*3+1]; + dest[i*4+0] = src[i*3+0]; + } + } +} // create the png data from post-deflated data static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color) { - int bytes = (depth == 16? 2 : 1); + int bytes = (depth == 16 ? 2 : 1); stbi__context *s = a->s; stbi__uint32 i,j,stride = x*out_n*bytes; stbi__uint32 img_len, img_width_bytes; + stbi_uc *filter_buf; + int all_ok = 1; int k; int img_n = s->img_n; // copy it into a local for later @@ -4295,196 +4712,149 @@ static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 r a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into if (!a->out) return stbi__err("outofmem", "Out of memory"); + // note: error exits here don't need to clean up a->out individually, + // stbi__do_png always does on error. + if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG"); img_width_bytes = (((img_n * x * depth) + 7) >> 3); + if (!stbi__mad2sizes_valid(img_width_bytes, y, img_width_bytes)) return stbi__err("too large", "Corrupt PNG"); img_len = (img_width_bytes + 1) * y; + // we used to check for exact match between raw_len and img_len on non-interlaced PNGs, // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros), // so just check for raw_len < img_len always. if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG"); + // Allocate two scan lines worth of filter workspace buffer. + filter_buf = (stbi_uc *) stbi__malloc_mad2(img_width_bytes, 2, 0); + if (!filter_buf) return stbi__err("outofmem", "Out of memory"); + + // Filtering for low-bit-depth images + if (depth < 8) { + filter_bytes = 1; + width = img_width_bytes; + } + for (j=0; j < y; ++j) { - stbi_uc *cur = a->out + stride*j; - stbi_uc *prior; + // cur/prior filter buffers alternate + stbi_uc *cur = filter_buf + (j & 1)*img_width_bytes; + stbi_uc *prior = filter_buf + (~j & 1)*img_width_bytes; + stbi_uc *dest = a->out + stride*j; + int nk = width * filter_bytes; int filter = *raw++; - if (filter > 4) - return stbi__err("invalid filter","Corrupt PNG"); - - if (depth < 8) { - STBI_ASSERT(img_width_bytes <= x); - cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place - filter_bytes = 1; - width = img_width_bytes; + // check filter type + if (filter > 4) { + all_ok = stbi__err("invalid filter","Corrupt PNG"); + break; } - prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above // if first row, use special filter that doesn't sample previous row if (j == 0) filter = first_row_filter[filter]; - // handle first byte explicitly - for (k=0; k < filter_bytes; ++k) { - switch (filter) { - case STBI__F_none : cur[k] = raw[k]; break; - case STBI__F_sub : cur[k] = raw[k]; break; - case STBI__F_up : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break; - case STBI__F_avg : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break; - case STBI__F_paeth : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break; - case STBI__F_avg_first : cur[k] = raw[k]; break; - case STBI__F_paeth_first: cur[k] = raw[k]; break; - } - } - - if (depth == 8) { - if (img_n != out_n) - cur[img_n] = 255; // first pixel - raw += img_n; - cur += out_n; - prior += out_n; - } else if (depth == 16) { - if (img_n != out_n) { - cur[filter_bytes] = 255; // first pixel top byte - cur[filter_bytes+1] = 255; // first pixel bottom byte - } - raw += filter_bytes; - cur += output_bytes; - prior += output_bytes; - } else { - raw += 1; - cur += 1; - prior += 1; + // perform actual filtering + switch (filter) { + case STBI__F_none: + memcpy(cur, raw, nk); + break; + case STBI__F_sub: + memcpy(cur, raw, filter_bytes); + for (k = filter_bytes; k < nk; ++k) + cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); + break; + case STBI__F_up: + for (k = 0; k < nk; ++k) + cur[k] = STBI__BYTECAST(raw[k] + prior[k]); + break; + case STBI__F_avg: + for (k = 0; k < filter_bytes; ++k) + cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); + for (k = filter_bytes; k < nk; ++k) + cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); + break; + case STBI__F_paeth: + for (k = 0; k < filter_bytes; ++k) + cur[k] = STBI__BYTECAST(raw[k] + prior[k]); // prior[k] == stbi__paeth(0,prior[k],0) + for (k = filter_bytes; k < nk; ++k) + cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes], prior[k], prior[k-filter_bytes])); + break; + case STBI__F_avg_first: + memcpy(cur, raw, filter_bytes); + for (k = filter_bytes; k < nk; ++k) + cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); + break; } - // this is a little gross, so that we don't switch per-pixel or per-component - if (depth < 8 || img_n == out_n) { - int nk = (width - 1)*filter_bytes; - #define STBI__CASE(f) \ - case f: \ - for (k=0; k < nk; ++k) - switch (filter) { - // "none" filter turns into a memcpy here; make that explicit. - case STBI__F_none: memcpy(cur, raw, nk); break; - STBI__CASE(STBI__F_sub) { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break; - STBI__CASE(STBI__F_up) { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break; - STBI__CASE(STBI__F_avg) { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break; - STBI__CASE(STBI__F_paeth) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break; - STBI__CASE(STBI__F_avg_first) { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break; - STBI__CASE(STBI__F_paeth_first) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break; - } - #undef STBI__CASE - raw += nk; - } else { - STBI_ASSERT(img_n+1 == out_n); - #define STBI__CASE(f) \ - case f: \ - for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \ - for (k=0; k < filter_bytes; ++k) - switch (filter) { - STBI__CASE(STBI__F_none) { cur[k] = raw[k]; } break; - STBI__CASE(STBI__F_sub) { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break; - STBI__CASE(STBI__F_up) { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break; - STBI__CASE(STBI__F_avg) { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break; - STBI__CASE(STBI__F_paeth) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break; - STBI__CASE(STBI__F_avg_first) { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break; - STBI__CASE(STBI__F_paeth_first) { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break; - } - #undef STBI__CASE - - // the loop above sets the high byte of the pixels' alpha, but for - // 16 bit png files we also need the low byte set. we'll do that here. - if (depth == 16) { - cur = a->out + stride*j; // start at the beginning of the row again - for (i=0; i < x; ++i,cur+=output_bytes) { - cur[filter_bytes+1] = 255; - } - } - } - } + raw += nk; - // we make a separate pass to expand bits to pixels; for performance, - // this could run two scanlines behind the above code, so it won't - // intefere with filtering but will still be in the cache. - if (depth < 8) { - for (j=0; j < y; ++j) { - stbi_uc *cur = a->out + stride*j; - stbi_uc *in = a->out + stride*j + x*out_n - img_width_bytes; - // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit - // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop + // expand decoded bits in cur to dest, also adding an extra alpha channel if desired + if (depth < 8) { stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range + stbi_uc *in = cur; + stbi_uc *out = dest; + stbi_uc inb = 0; + stbi__uint32 nsmp = x*img_n; - // note that the final byte might overshoot and write more data than desired. - // we can allocate enough data that this never writes out of memory, but it - // could also overwrite the next scanline. can it overwrite non-empty data - // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel. - // so we need to explicitly clamp the final ones - + // expand bits to bytes first if (depth == 4) { - for (k=x*img_n; k >= 2; k-=2, ++in) { - *cur++ = scale * ((*in >> 4) ); - *cur++ = scale * ((*in ) & 0x0f); + for (i=0; i < nsmp; ++i) { + if ((i & 1) == 0) inb = *in++; + *out++ = scale * (inb >> 4); + inb <<= 4; } - if (k > 0) *cur++ = scale * ((*in >> 4) ); } else if (depth == 2) { - for (k=x*img_n; k >= 4; k-=4, ++in) { - *cur++ = scale * ((*in >> 6) ); - *cur++ = scale * ((*in >> 4) & 0x03); - *cur++ = scale * ((*in >> 2) & 0x03); - *cur++ = scale * ((*in ) & 0x03); + for (i=0; i < nsmp; ++i) { + if ((i & 3) == 0) inb = *in++; + *out++ = scale * (inb >> 6); + inb <<= 2; } - if (k > 0) *cur++ = scale * ((*in >> 6) ); - if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03); - if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03); - } else if (depth == 1) { - for (k=x*img_n; k >= 8; k-=8, ++in) { - *cur++ = scale * ((*in >> 7) ); - *cur++ = scale * ((*in >> 6) & 0x01); - *cur++ = scale * ((*in >> 5) & 0x01); - *cur++ = scale * ((*in >> 4) & 0x01); - *cur++ = scale * ((*in >> 3) & 0x01); - *cur++ = scale * ((*in >> 2) & 0x01); - *cur++ = scale * ((*in >> 1) & 0x01); - *cur++ = scale * ((*in ) & 0x01); + } else { + STBI_ASSERT(depth == 1); + for (i=0; i < nsmp; ++i) { + if ((i & 7) == 0) inb = *in++; + *out++ = scale * (inb >> 7); + inb <<= 1; } - if (k > 0) *cur++ = scale * ((*in >> 7) ); - if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01); - if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01); - if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01); - if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01); - if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01); - if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01); } - if (img_n != out_n) { - int q; - // insert alpha = 255 - cur = a->out + stride*j; + + // insert alpha=255 values if desired + if (img_n != out_n) + stbi__create_png_alpha_expand8(dest, dest, x, img_n); + } else if (depth == 8) { + if (img_n == out_n) + memcpy(dest, cur, x*img_n); + else + stbi__create_png_alpha_expand8(dest, cur, x, img_n); + } else if (depth == 16) { + // convert the image data from big-endian to platform-native + stbi__uint16 *dest16 = (stbi__uint16*)dest; + stbi__uint32 nsmp = x*img_n; + + if (img_n == out_n) { + for (i = 0; i < nsmp; ++i, ++dest16, cur += 2) + *dest16 = (cur[0] << 8) | cur[1]; + } else { + STBI_ASSERT(img_n+1 == out_n); if (img_n == 1) { - for (q=x-1; q >= 0; --q) { - cur[q*2+1] = 255; - cur[q*2+0] = cur[q]; + for (i = 0; i < x; ++i, dest16 += 2, cur += 2) { + dest16[0] = (cur[0] << 8) | cur[1]; + dest16[1] = 0xffff; } } else { STBI_ASSERT(img_n == 3); - for (q=x-1; q >= 0; --q) { - cur[q*4+3] = 255; - cur[q*4+2] = cur[q*3+2]; - cur[q*4+1] = cur[q*3+1]; - cur[q*4+0] = cur[q*3+0]; + for (i = 0; i < x; ++i, dest16 += 4, cur += 6) { + dest16[0] = (cur[0] << 8) | cur[1]; + dest16[1] = (cur[2] << 8) | cur[3]; + dest16[2] = (cur[4] << 8) | cur[5]; + dest16[3] = 0xffff; } } } } - } else if (depth == 16) { - // force the image data from big-endian to platform-native. - // this is done in a separate pass due to the decoding relying - // on the data being untouched, but could probably be done - // per-line during decode if care is taken. - stbi_uc *cur = a->out; - stbi__uint16 *cur16 = (stbi__uint16*)cur; - - for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) { - *cur16 = (cur[0] << 8) | cur[1]; - } } + STBI_FREE(filter_buf); + if (!all_ok) return 0; + return 1; } @@ -4499,6 +4869,7 @@ static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint3 // de-interlacing final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0); + if (!final) return stbi__err("outofmem", "Out of memory"); for (p=0; p < 7; ++p) { int xorig[] = { 0,4,0,2,0,1,0 }; int yorig[] = { 0,0,4,0,2,0,1 }; @@ -4619,19 +4990,46 @@ static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int return 1; } -static int stbi__unpremultiply_on_load = 0; -static int stbi__de_iphone_flag = 0; +static int stbi__unpremultiply_on_load_global = 0; +static int stbi__de_iphone_flag_global = 0; STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply) { - stbi__unpremultiply_on_load = flag_true_if_should_unpremultiply; + stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply; } STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert) { - stbi__de_iphone_flag = flag_true_if_should_convert; + stbi__de_iphone_flag_global = flag_true_if_should_convert; +} + +#ifndef STBI_THREAD_LOCAL +#define stbi__unpremultiply_on_load stbi__unpremultiply_on_load_global +#define stbi__de_iphone_flag stbi__de_iphone_flag_global +#else +static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set; +static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set; + +STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply) +{ + stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply; + stbi__unpremultiply_on_load_set = 1; } +STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert) +{ + stbi__de_iphone_flag_local = flag_true_if_should_convert; + stbi__de_iphone_flag_set = 1; +} + +#define stbi__unpremultiply_on_load (stbi__unpremultiply_on_load_set \ + ? stbi__unpremultiply_on_load_local \ + : stbi__unpremultiply_on_load_global) +#define stbi__de_iphone_flag (stbi__de_iphone_flag_set \ + ? stbi__de_iphone_flag_local \ + : stbi__de_iphone_flag_global) +#endif // STBI_THREAD_LOCAL + static void stbi__de_iphone(stbi__png *z) { stbi__context *s = z->s; @@ -4675,12 +5073,12 @@ static void stbi__de_iphone(stbi__png *z) } } -#define STBI__PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d)) +#define STBI__PNG_TYPE(a,b,c,d) (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d)) static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) { stbi_uc palette[1024], pal_img_n=0; - stbi_uc has_trans=0, tc[3]; + stbi_uc has_trans=0, tc[3]={0}; stbi__uint16 tc16[3]; stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0; int first=1,k,interlace=0, color=0, is_iphone=0; @@ -4706,8 +5104,10 @@ static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) if (!first) return stbi__err("multiple IHDR","Corrupt PNG"); first = 0; if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG"); - s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)"); - s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)"); + s->img_x = stbi__get32be(s); + s->img_y = stbi__get32be(s); + if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)"); + if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)"); z->depth = stbi__get8(s); if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16) return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only"); color = stbi__get8(s); if (color > 6) return stbi__err("bad ctype","Corrupt PNG"); if (color == 3 && z->depth == 16) return stbi__err("bad ctype","Corrupt PNG"); @@ -4719,14 +5119,13 @@ static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) if (!pal_img_n) { s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0); if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode"); - if (scan == STBI__SCAN_header) return 1; } else { // if paletted, then pal_n is our final components, and // img_n is # components to decompress/filter. s->img_n = 1; if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG"); - // if SCAN_header, have to scan to see if we have a tRNS } + // even with SCAN_header, have to scan to see if we have a tRNS break; } @@ -4758,10 +5157,14 @@ static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG"); if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG"); has_trans = 1; + // non-paletted with tRNS = constant alpha. if header-scanning, we can stop now. + if (scan == STBI__SCAN_header) { ++s->img_n; return 1; } if (z->depth == 16) { - for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is + for (k = 0; k < s->img_n && k < 3; ++k) // extra loop test to suppress false GCC warning + tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is } else { - for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger + for (k = 0; k < s->img_n && k < 3; ++k) + tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger } } break; @@ -4770,7 +5173,13 @@ static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) case STBI__PNG_TYPE('I','D','A','T'): { if (first) return stbi__err("first not IHDR", "Corrupt PNG"); if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG"); - if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; } + if (scan == STBI__SCAN_header) { + // header scan definitely stops at first IDAT + if (pal_img_n) + s->img_n = pal_img_n; + return 1; + } + if (c.length > (1u << 30)) return stbi__err("IDAT size limit", "IDAT section larger than 2^30 bytes"); if ((int)(ioff + c.length) < (int)ioff) return 0; if (ioff + c.length > idata_limit) { stbi__uint32 idata_limit_old = idata_limit; @@ -4824,6 +5233,8 @@ static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) ++s->img_n; } STBI_FREE(z->expanded); z->expanded = NULL; + // end of PNG chunk, read and skip CRC + stbi__get32be(s); return 1; } @@ -4854,10 +5265,12 @@ static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, st void *result=NULL; if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error"); if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) { - if (p->depth < 8) + if (p->depth <= 8) ri->bits_per_channel = 8; + else if (p->depth == 16) + ri->bits_per_channel = 16; else - ri->bits_per_channel = p->depth; + return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth"); result = p->out; p->out = NULL; if (req_comp && req_comp != p->s->img_out_n) { @@ -4912,6 +5325,19 @@ static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp) p.s = s; return stbi__png_info_raw(&p, x, y, comp); } + +static int stbi__png_is16(stbi__context *s) +{ + stbi__png p; + p.s = s; + if (!stbi__png_info_raw(&p, NULL, NULL, NULL)) + return 0; + if (p.depth != 16) { + stbi__rewind(p.s); + return 0; + } + return 1; +} #endif // Microsoft/Windows BMP image @@ -4945,11 +5371,11 @@ static int stbi__high_bit(unsigned int z) { int n=0; if (z == 0) return -1; - if (z >= 0x10000) n += 16, z >>= 16; - if (z >= 0x00100) n += 8, z >>= 8; - if (z >= 0x00010) n += 4, z >>= 4; - if (z >= 0x00004) n += 2, z >>= 2; - if (z >= 0x00002) n += 1, z >>= 1; + if (z >= 0x10000) { n += 16; z >>= 16; } + if (z >= 0x00100) { n += 8; z >>= 8; } + if (z >= 0x00010) { n += 4; z >>= 4; } + if (z >= 0x00004) { n += 2; z >>= 2; } + if (z >= 0x00002) { n += 1;/* >>= 1;*/ } return n; } @@ -4963,29 +5389,62 @@ static int stbi__bitcount(unsigned int a) return a & 0xff; } -static int stbi__shiftsigned(int v, int shift, int bits) -{ - int result; - int z=0; - - if (shift < 0) v <<= -shift; - else v >>= shift; - result = v; - - z = bits; - while (z < 8) { - result += v >> z; - z += bits; - } - return result; +// extract an arbitrarily-aligned N-bit value (N=bits) +// from v, and then make it 8-bits long and fractionally +// extend it to full full range. +static int stbi__shiftsigned(unsigned int v, int shift, int bits) +{ + static unsigned int mul_table[9] = { + 0, + 0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/, + 0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/, + }; + static unsigned int shift_table[9] = { + 0, 0,0,1,0,2,4,6,0, + }; + if (shift < 0) + v <<= -shift; + else + v >>= shift; + STBI_ASSERT(v < 256); + v >>= (8-bits); + STBI_ASSERT(bits >= 0 && bits <= 8); + return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits]; } typedef struct { int bpp, offset, hsz; unsigned int mr,mg,mb,ma, all_a; + int extra_read; } stbi__bmp_data; +static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress) +{ + // BI_BITFIELDS specifies masks explicitly, don't override + if (compress == 3) + return 1; + + if (compress == 0) { + if (info->bpp == 16) { + info->mr = 31u << 10; + info->mg = 31u << 5; + info->mb = 31u << 0; + } else if (info->bpp == 32) { + info->mr = 0xffu << 16; + info->mg = 0xffu << 8; + info->mb = 0xffu << 0; + info->ma = 0xffu << 24; + info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0 + } else { + // otherwise, use defaults, which is all-0 + info->mr = info->mg = info->mb = info->ma = 0; + } + return 1; + } + return 0; // error +} + static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) { int hsz; @@ -4996,6 +5455,9 @@ static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) info->offset = stbi__get32le(s); info->hsz = hsz = stbi__get32le(s); info->mr = info->mg = info->mb = info->ma = 0; + info->extra_read = 14; + + if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP"); if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown"); if (hsz == 12) { @@ -5007,10 +5469,11 @@ static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) } if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP"); info->bpp = stbi__get16le(s); - if (info->bpp == 1) return stbi__errpuc("monochrome", "BMP type not supported: 1-bit"); if (hsz != 12) { int compress = stbi__get32le(s); if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE"); + if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes + if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel stbi__get32le(s); // discard sizeof stbi__get32le(s); // discard hres stbi__get32le(s); // discard vres @@ -5025,21 +5488,12 @@ static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) } if (info->bpp == 16 || info->bpp == 32) { if (compress == 0) { - if (info->bpp == 32) { - info->mr = 0xffu << 16; - info->mg = 0xffu << 8; - info->mb = 0xffu << 0; - info->ma = 0xffu << 24; - info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0 - } else { - info->mr = 31u << 10; - info->mg = 31u << 5; - info->mb = 31u << 0; - } + stbi__bmp_set_mask_defaults(info, compress); } else if (compress == 3) { info->mr = stbi__get32le(s); info->mg = stbi__get32le(s); info->mb = stbi__get32le(s); + info->extra_read += 12; // not documented, but generated by photoshop and handled by mspaint if (info->mr == info->mg && info->mg == info->mb) { // ?!?!? @@ -5049,6 +5503,7 @@ static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) return stbi__errpuc("bad BMP", "bad BMP"); } } else { + // V4/V5 header int i; if (hsz != 108 && hsz != 124) return stbi__errpuc("bad BMP", "bad BMP"); @@ -5056,6 +5511,8 @@ static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info) info->mg = stbi__get32le(s); info->mb = stbi__get32le(s); info->ma = stbi__get32le(s); + if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs + stbi__bmp_set_mask_defaults(info, compress); stbi__get32le(s); // discard color space for (i=0; i < 12; ++i) stbi__get32le(s); // discard color space parameters @@ -5088,6 +5545,9 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req flip_vertically = ((int) s->img_y) > 0; s->img_y = abs((int) s->img_y); + if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + mr = info.mr; mg = info.mg; mb = info.mb; @@ -5096,13 +5556,35 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req if (info.hsz == 12) { if (info.bpp < 24) - psize = (info.offset - 14 - 24) / 3; + psize = (info.offset - info.extra_read - 24) / 3; } else { if (info.bpp < 16) - psize = (info.offset - 14 - info.hsz) >> 2; + psize = (info.offset - info.extra_read - info.hsz) >> 2; + } + if (psize == 0) { + // accept some number of extra bytes after the header, but if the offset points either to before + // the header ends or implies a large amount of extra data, reject the file as malformed + int bytes_read_so_far = s->callback_already_read + (int)(s->img_buffer - s->img_buffer_original); + int header_limit = 1024; // max we actually read is below 256 bytes currently. + int extra_data_limit = 256*4; // what ordinarily goes here is a palette; 256 entries*4 bytes is its max size. + if (bytes_read_so_far <= 0 || bytes_read_so_far > header_limit) { + return stbi__errpuc("bad header", "Corrupt BMP"); + } + // we established that bytes_read_so_far is positive and sensible. + // the first half of this test rejects offsets that are either too small positives, or + // negative, and guarantees that info.offset >= bytes_read_so_far > 0. this in turn + // ensures the number computed in the second half of the test can't overflow. + if (info.offset < bytes_read_so_far || info.offset - bytes_read_so_far > extra_data_limit) { + return stbi__errpuc("bad offset", "Corrupt BMP"); + } else { + stbi__skip(s, info.offset - bytes_read_so_far); + } } - s->img_n = ma ? 4 : 3; + if (info.bpp == 24 && ma == 0xff000000) + s->img_n = 3; + else + s->img_n = ma ? 4 : 3; if (req_comp && req_comp >= 3) // we can directly decode 3 or 4 target = req_comp; else @@ -5124,36 +5606,56 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req if (info.hsz != 12) stbi__get8(s); pal[i][3] = 255; } - stbi__skip(s, info.offset - 14 - info.hsz - psize * (info.hsz == 12 ? 3 : 4)); - if (info.bpp == 4) width = (s->img_x + 1) >> 1; + stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4)); + if (info.bpp == 1) width = (s->img_x + 7) >> 3; + else if (info.bpp == 4) width = (s->img_x + 1) >> 1; else if (info.bpp == 8) width = s->img_x; else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); } pad = (-width)&3; - for (j=0; j < (int) s->img_y; ++j) { - for (i=0; i < (int) s->img_x; i += 2) { - int v=stbi__get8(s),v2=0; - if (info.bpp == 4) { - v2 = v & 15; - v >>= 4; + if (info.bpp == 1) { + for (j=0; j < (int) s->img_y; ++j) { + int bit_offset = 7, v = stbi__get8(s); + for (i=0; i < (int) s->img_x; ++i) { + int color = (v>>bit_offset)&0x1; + out[z++] = pal[color][0]; + out[z++] = pal[color][1]; + out[z++] = pal[color][2]; + if (target == 4) out[z++] = 255; + if (i+1 == (int) s->img_x) break; + if((--bit_offset) < 0) { + bit_offset = 7; + v = stbi__get8(s); + } } - out[z++] = pal[v][0]; - out[z++] = pal[v][1]; - out[z++] = pal[v][2]; - if (target == 4) out[z++] = 255; - if (i+1 == (int) s->img_x) break; - v = (info.bpp == 8) ? stbi__get8(s) : v2; - out[z++] = pal[v][0]; - out[z++] = pal[v][1]; - out[z++] = pal[v][2]; - if (target == 4) out[z++] = 255; + stbi__skip(s, pad); + } + } else { + for (j=0; j < (int) s->img_y; ++j) { + for (i=0; i < (int) s->img_x; i += 2) { + int v=stbi__get8(s),v2=0; + if (info.bpp == 4) { + v2 = v & 15; + v >>= 4; + } + out[z++] = pal[v][0]; + out[z++] = pal[v][1]; + out[z++] = pal[v][2]; + if (target == 4) out[z++] = 255; + if (i+1 == (int) s->img_x) break; + v = (info.bpp == 8) ? stbi__get8(s) : v2; + out[z++] = pal[v][0]; + out[z++] = pal[v][1]; + out[z++] = pal[v][2]; + if (target == 4) out[z++] = 255; + } + stbi__skip(s, pad); } - stbi__skip(s, pad); } } else { int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0; int z = 0; int easy=0; - stbi__skip(s, info.offset - 14 - info.hsz); + stbi__skip(s, info.offset - info.extra_read - info.hsz); if (info.bpp == 24) width = 3 * s->img_x; else if (info.bpp == 16) width = 2*s->img_x; else /* bpp = 32 and pad = 0 */ width=0; @@ -5171,6 +5673,7 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg); bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb); ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma); + if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); } } for (j=0; j < (int) s->img_y; ++j) { if (easy) { @@ -5188,7 +5691,7 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req int bpp = info.bpp; for (i=0; i < (int) s->img_x; ++i) { stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s)); - int a; + unsigned int a; out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount)); out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount)); out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount)); @@ -5212,7 +5715,7 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req stbi_uc *p1 = out + j *s->img_x*target; stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target; for (i=0; i < (int) s->img_x*target; ++i) { - t = p1[i], p1[i] = p2[i], p2[i] = t; + t = p1[i]; p1[i] = p2[i]; p2[i] = t; } } } @@ -5236,14 +5739,14 @@ static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16) { // only RGB or RGBA (incl. 16bit) or grey allowed - if(is_rgb16) *is_rgb16 = 0; + if (is_rgb16) *is_rgb16 = 0; switch(bits_per_pixel) { case 8: return STBI_grey; case 16: if(is_grey) return STBI_grey_alpha; - // else: fall-through + // fallthrough case 15: if(is_rgb16) *is_rgb16 = 1; - return STBI_rgb; - case 24: // fall-through + return STBI_rgb; + case 24: // fallthrough case 32: return bits_per_pixel/8; default: return 0; } @@ -5392,6 +5895,11 @@ static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req int RLE_repeating = 0; int read_next_pixel = 1; STBI_NOTUSED(ri); + STBI_NOTUSED(tga_x_origin); // @TODO + STBI_NOTUSED(tga_y_origin); // @TODO + + if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); // do a tiny bit of precessing if ( tga_image_type >= 8 ) @@ -5432,6 +5940,11 @@ static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req // do I need to load a palette? if ( tga_indexed) { + if (tga_palette_len == 0) { /* you have to have at least one entry! */ + STBI_FREE(tga_data); + return stbi__errpuc("bad palette", "Corrupt TGA"); + } + // any data to skip? (offset usually = 0) stbi__skip(s, tga_palette_start ); // load the palette @@ -5555,6 +6068,7 @@ static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req // Microsoft's C compilers happy... [8^( tga_palette_start = tga_palette_len = tga_palette_bits = tga_x_origin = tga_y_origin = 0; + STBI_NOTUSED(tga_palette_start); // OK, done return tga_data; } @@ -5639,6 +6153,9 @@ static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req h = stbi__get32be(s); w = stbi__get32be(s); + if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + // Make sure the depth is 8 bits. bitdepth = stbi__get16be(s); if (bitdepth != 8 && bitdepth != 16) @@ -5702,7 +6219,7 @@ static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req // Else if n is 128, noop. // Endloop - // The RLE-compressed data is preceeded by a 2-byte data count for each row in the data, + // The RLE-compressed data is preceded by a 2-byte data count for each row in the data, // which we're going to just skip. stbi__skip(s, h * channelCount * 2 ); @@ -5993,6 +6510,10 @@ static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_c x = stbi__get16be(s); y = stbi__get16be(s); + + if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + if (stbi__at_eof(s)) return stbi__errpuc("bad file","file too short (pic header)"); if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode"); @@ -6002,6 +6523,7 @@ static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_c // intermediate buffer is RGBA result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0); + if (!result) return stbi__errpuc("outofmem", "Out of memory"); memset(result, 0xff, x*y*4); if (!stbi__pic_load_core(s,x,y,comp, result)) { @@ -6038,11 +6560,13 @@ typedef struct typedef struct { int w,h; - stbi_uc *out, *old_out; // output buffer (always 4 components) - int flags, bgindex, ratio, transparent, eflags, delay; + stbi_uc *out; // output buffer (always 4 components) + stbi_uc *background; // The current "background" as far as a gif is concerned + stbi_uc *history; + int flags, bgindex, ratio, transparent, eflags; stbi_uc pal[256][4]; stbi_uc lpal[256][4]; - stbi__gif_lzw codes[4096]; + stbi__gif_lzw codes[8192]; stbi_uc *color_table; int parse, step; int lflags; @@ -6050,6 +6574,7 @@ typedef struct int max_x, max_y; int cur_x, cur_y; int line_size; + int delay; } stbi__gif; static int stbi__gif_test_raw(stbi__context *s) @@ -6098,6 +6623,9 @@ static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_in g->ratio = stbi__get8(s); g->transparent = -1; + if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)"); + if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)"); + if (comp != 0) *comp = 4; // can't actually tell whether it's 3 or 4 until we parse the comments if (is_info) return 1; @@ -6111,6 +6639,7 @@ static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_in static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp) { stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif)); + if (!g) return stbi__err("outofmem", "Out of memory"); if (!stbi__gif_header(s, g, comp, 1)) { STBI_FREE(g); stbi__rewind( s ); @@ -6125,6 +6654,7 @@ static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp) static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code) { stbi_uc *p, *c; + int idx; // recurse to decode the prefixes, since the linked-list is backwards, // and working backwards through an interleaved image would be nasty @@ -6133,10 +6663,12 @@ static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code) if (g->cur_y >= g->max_y) return; - p = &g->out[g->cur_x + g->cur_y]; - c = &g->color_table[g->codes[code].suffix * 4]; + idx = g->cur_x + g->cur_y; + p = &g->out[idx]; + g->history[idx / 4] = 1; - if (c[3] >= 128) { + c = &g->color_table[g->codes[code].suffix * 4]; + if (c[3] > 128) { // don't render transparent pixels; p[0] = c[2]; p[1] = c[1]; p[2] = c[0]; @@ -6210,11 +6742,16 @@ static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g) stbi__skip(s,len); return g->out; } else if (code <= avail) { - if (first) return stbi__errpuc("no clear code", "Corrupt GIF"); + if (first) { + return stbi__errpuc("no clear code", "Corrupt GIF"); + } if (oldcode >= 0) { p = &g->codes[avail++]; - if (avail > 4096) return stbi__errpuc("too many codes", "Corrupt GIF"); + if (avail > 8192) { + return stbi__errpuc("too many codes", "Corrupt GIF"); + } + p->prefix = (stbi__int16) oldcode; p->first = g->codes[oldcode].first; p->suffix = (code == avail) ? p->first : g->codes[code].first; @@ -6236,62 +6773,77 @@ static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g) } } -static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1) -{ - int x, y; - stbi_uc *c = g->pal[g->bgindex]; - for (y = y0; y < y1; y += 4 * g->w) { - for (x = x0; x < x1; x += 4) { - stbi_uc *p = &g->out[y + x]; - p[0] = c[2]; - p[1] = c[1]; - p[2] = c[0]; - p[3] = 0; - } - } -} - // this function is designed to support animated gifs, although stb_image doesn't support it -static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp) +// two back is the image from two frames ago, used for a very specific disposal format +static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back) { - int i; - stbi_uc *prev_out = 0; - - if (g->out == 0 && !stbi__gif_header(s, g, comp,0)) - return 0; // stbi__g_failure_reason set by stbi__gif_header + int dispose; + int first_frame; + int pi; + int pcount; + STBI_NOTUSED(req_comp); - if (!stbi__mad3sizes_valid(g->w, g->h, 4, 0)) - return stbi__errpuc("too large", "GIF too large"); + // on first frame, any non-written pixels get the background colour (non-transparent) + first_frame = 0; + if (g->out == 0) { + if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header + if (!stbi__mad3sizes_valid(4, g->w, g->h, 0)) + return stbi__errpuc("too large", "GIF image is too large"); + pcount = g->w * g->h; + g->out = (stbi_uc *) stbi__malloc(4 * pcount); + g->background = (stbi_uc *) stbi__malloc(4 * pcount); + g->history = (stbi_uc *) stbi__malloc(pcount); + if (!g->out || !g->background || !g->history) + return stbi__errpuc("outofmem", "Out of memory"); + + // image is treated as "transparent" at the start - ie, nothing overwrites the current background; + // background colour is only used for pixels that are not rendered first frame, after that "background" + // color refers to the color that was there the previous frame. + memset(g->out, 0x00, 4 * pcount); + memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent) + memset(g->history, 0x00, pcount); // pixels that were affected previous frame + first_frame = 1; + } else { + // second frame - how do we dispose of the previous one? + dispose = (g->eflags & 0x1C) >> 2; + pcount = g->w * g->h; - prev_out = g->out; - g->out = (stbi_uc *) stbi__malloc_mad3(4, g->w, g->h, 0); - if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory"); + if ((dispose == 3) && (two_back == 0)) { + dispose = 2; // if I don't have an image to revert back to, default to the old background + } - switch ((g->eflags & 0x1C) >> 2) { - case 0: // unspecified (also always used on 1st frame) - stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h); - break; - case 1: // do not dispose - if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h); - g->old_out = prev_out; - break; - case 2: // dispose to background - if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h); - stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y); - break; - case 3: // dispose to previous - if (g->old_out) { - for (i = g->start_y; i < g->max_y; i += 4 * g->w) - memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x); + if (dispose == 3) { // use previous graphic + for (pi = 0; pi < pcount; ++pi) { + if (g->history[pi]) { + memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 ); + } } - break; + } else if (dispose == 2) { + // restore what was changed last frame to background before that frame; + for (pi = 0; pi < pcount; ++pi) { + if (g->history[pi]) { + memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 ); + } + } + } else { + // This is a non-disposal case eithe way, so just + // leave the pixels as is, and they will become the new background + // 1: do not dispose + // 0: not specified. + } + + // background is what out is after the undoing of the previou frame; + memcpy( g->background, g->out, 4 * g->w * g->h ); } + // clear my history; + memset( g->history, 0x00, g->w * g->h ); // pixels that were affected previous frame + for (;;) { - switch (stbi__get8(s)) { + int tag = stbi__get8(s); + switch (tag) { case 0x2C: /* Image Descriptor */ { - int prev_trans = -1; stbi__int32 x, y, w, h; stbi_uc *o; @@ -6310,6 +6862,13 @@ static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, i g->cur_x = g->start_x; g->cur_y = g->start_y; + // if the width of the specified rectangle is 0, that means + // we may not see *any* pixels or the image is malformed; + // to make sure this is caught, move the current y down to + // max_y (which is what out_gif_code checks). + if (w == 0) + g->cur_y = g->max_y; + g->lflags = stbi__get8(s); if (g->lflags & 0x40) { @@ -6324,19 +6883,24 @@ static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, i stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1); g->color_table = (stbi_uc *) g->lpal; } else if (g->flags & 0x80) { - if (g->transparent >= 0 && (g->eflags & 0x01)) { - prev_trans = g->pal[g->transparent][3]; - g->pal[g->transparent][3] = 0; - } g->color_table = (stbi_uc *) g->pal; } else return stbi__errpuc("missing color table", "Corrupt GIF"); o = stbi__process_gif_raster(s, g); - if (o == NULL) return NULL; - - if (prev_trans != -1) - g->pal[g->transparent][3] = (stbi_uc) prev_trans; + if (!o) return NULL; + + // if this was the first frame, + pcount = g->w * g->h; + if (first_frame && (g->bgindex > 0)) { + // if first frame, any pixel not drawn to gets the background color + for (pi = 0; pi < pcount; ++pi) { + if (g->history[pi] == 0) { + g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be; + memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 ); + } + } + } return o; } @@ -6344,19 +6908,35 @@ static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, i case 0x21: // Comment Extension. { int len; - if (stbi__get8(s) == 0xF9) { // Graphic Control Extension. + int ext = stbi__get8(s); + if (ext == 0xF9) { // Graphic Control Extension. len = stbi__get8(s); if (len == 4) { g->eflags = stbi__get8(s); - g->delay = stbi__get16le(s); - g->transparent = stbi__get8(s); + g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths. + + // unset old transparent + if (g->transparent >= 0) { + g->pal[g->transparent][3] = 255; + } + if (g->eflags & 0x01) { + g->transparent = stbi__get8(s); + if (g->transparent >= 0) { + g->pal[g->transparent][3] = 0; + } + } else { + // don't need transparent + stbi__skip(s, 1); + g->transparent = -1; + } } else { stbi__skip(s, len); break; } } - while ((len = stbi__get8(s)) != 0) + while ((len = stbi__get8(s)) != 0) { stbi__skip(s, len); + } break; } @@ -6367,28 +6947,130 @@ static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, i return stbi__errpuc("unknown code", "Corrupt GIF"); } } +} - STBI_NOTUSED(req_comp); +static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays) +{ + STBI_FREE(g->out); + STBI_FREE(g->history); + STBI_FREE(g->background); + + if (out) STBI_FREE(out); + if (delays && *delays) STBI_FREE(*delays); + return stbi__errpuc("outofmem", "Out of memory"); +} + +static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp) +{ + if (stbi__gif_test(s)) { + int layers = 0; + stbi_uc *u = 0; + stbi_uc *out = 0; + stbi_uc *two_back = 0; + stbi__gif g; + int stride; + int out_size = 0; + int delays_size = 0; + + STBI_NOTUSED(out_size); + STBI_NOTUSED(delays_size); + + memset(&g, 0, sizeof(g)); + if (delays) { + *delays = 0; + } + + do { + u = stbi__gif_load_next(s, &g, comp, req_comp, two_back); + if (u == (stbi_uc *) s) u = 0; // end of animated gif marker + + if (u) { + *x = g.w; + *y = g.h; + ++layers; + stride = g.w * g.h * 4; + + if (out) { + void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride ); + if (!tmp) + return stbi__load_gif_main_outofmem(&g, out, delays); + else { + out = (stbi_uc*) tmp; + out_size = layers * stride; + } + + if (delays) { + int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers ); + if (!new_delays) + return stbi__load_gif_main_outofmem(&g, out, delays); + *delays = new_delays; + delays_size = layers * sizeof(int); + } + } else { + out = (stbi_uc*)stbi__malloc( layers * stride ); + if (!out) + return stbi__load_gif_main_outofmem(&g, out, delays); + out_size = layers * stride; + if (delays) { + *delays = (int*) stbi__malloc( layers * sizeof(int) ); + if (!*delays) + return stbi__load_gif_main_outofmem(&g, out, delays); + delays_size = layers * sizeof(int); + } + } + memcpy( out + ((layers - 1) * stride), u, stride ); + if (layers >= 2) { + two_back = out - 2 * stride; + } + + if (delays) { + (*delays)[layers - 1U] = g.delay; + } + } + } while (u != 0); + + // free temp buffer; + STBI_FREE(g.out); + STBI_FREE(g.history); + STBI_FREE(g.background); + + // do the final conversion after loading everything; + if (req_comp && req_comp != 4) + out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h); + + *z = layers; + return out; + } else { + return stbi__errpuc("not GIF", "Image was not as a gif type."); + } } static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri) { stbi_uc *u = 0; - stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif)); - memset(g, 0, sizeof(*g)); + stbi__gif g; + memset(&g, 0, sizeof(g)); STBI_NOTUSED(ri); - u = stbi__gif_load_next(s, g, comp, req_comp); + u = stbi__gif_load_next(s, &g, comp, req_comp, 0); if (u == (stbi_uc *) s) u = 0; // end of animated gif marker if (u) { - *x = g->w; - *y = g->h; + *x = g.w; + *y = g.h; + + // moved conversion to after successful load so that the same + // can be done for multiple frames. if (req_comp && req_comp != 4) - u = stbi__convert_format(u, 4, req_comp, g->w, g->h); + u = stbi__convert_format(u, 4, req_comp, g.w, g.h); + } else if (g.out) { + // if there was an error and we allocated an image buffer, free it! + STBI_FREE(g.out); } - else if (g->out) - STBI_FREE(g->out); - STBI_FREE(g); + + // free buffers needed for multiple frame loading; + STBI_FREE(g.history); + STBI_FREE(g.background); + return u; } @@ -6512,6 +7194,9 @@ static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int re token += 3; width = (int) strtol(token, NULL, 10); + if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)"); + if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)"); + *x = width; *y = height; @@ -6580,12 +7265,12 @@ static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int re // Run value = stbi__get8(s); count -= 128; - if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); } + if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); } for (z = 0; z < count; ++z) scanline[i++ * 4 + k] = value; } else { // Dump - if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); } + if ((count == 0) || (count > nleft)) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); } for (z = 0; z < count; ++z) scanline[i++ * 4 + k] = stbi__get8(s); } @@ -6654,12 +7339,18 @@ static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp) info.all_a = 255; p = stbi__bmp_parse_header(s, &info); - stbi__rewind( s ); - if (p == NULL) + if (p == NULL) { + stbi__rewind( s ); return 0; + } if (x) *x = s->img_x; if (y) *y = s->img_y; - if (comp) *comp = info.ma ? 4 : 3; + if (comp) { + if (info.bpp == 24 && info.ma == 0xff000000) + *comp = 3; + else + *comp = info.ma ? 4 : 3; + } return 1; } #endif @@ -6667,7 +7358,7 @@ static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp) #ifndef STBI_NO_PSD static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp) { - int channelCount, dummy; + int channelCount, dummy, depth; if (!x) x = &dummy; if (!y) y = &dummy; if (!comp) comp = &dummy; @@ -6687,7 +7378,8 @@ static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp) } *y = stbi__get32be(s); *x = stbi__get32be(s); - if (stbi__get16be(s) != 8) { + depth = stbi__get16be(s); + if (depth != 8 && depth != 16) { stbi__rewind( s ); return 0; } @@ -6698,6 +7390,33 @@ static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp) *comp = 4; return 1; } + +static int stbi__psd_is16(stbi__context *s) +{ + int channelCount, depth; + if (stbi__get32be(s) != 0x38425053) { + stbi__rewind( s ); + return 0; + } + if (stbi__get16be(s) != 1) { + stbi__rewind( s ); + return 0; + } + stbi__skip(s, 6); + channelCount = stbi__get16be(s); + if (channelCount < 0 || channelCount > 16) { + stbi__rewind( s ); + return 0; + } + STBI_NOTUSED(stbi__get32be(s)); + STBI_NOTUSED(stbi__get32be(s)); + depth = stbi__get16be(s); + if (depth != 16) { + stbi__rewind( s ); + return 0; + } + return 1; +} #endif #ifndef STBI_NO_PIC @@ -6769,7 +7488,6 @@ static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp) // Known limitations: // Does not support comments in the header section // Does not support ASCII image data (formats P2 and P3) -// Does not support 16-bit-per-channel #ifndef STBI_NO_PNM @@ -6790,22 +7508,33 @@ static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req stbi_uc *out; STBI_NOTUSED(ri); - if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n)) + ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n); + if (ri->bits_per_channel == 0) return 0; + if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)"); + *x = s->img_x; *y = s->img_y; if (comp) *comp = s->img_n; - if (!stbi__mad3sizes_valid(s->img_n, s->img_x, s->img_y, 0)) + if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0)) return stbi__errpuc("too large", "PNM too large"); - out = (stbi_uc *) stbi__malloc_mad3(s->img_n, s->img_x, s->img_y, 0); + out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0); if (!out) return stbi__errpuc("outofmem", "Out of memory"); - stbi__getn(s, out, s->img_n * s->img_x * s->img_y); + if (!stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8))) { + STBI_FREE(out); + return stbi__errpuc("bad PNM", "PNM file truncated"); + } if (req_comp && req_comp != s->img_n) { - out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y); + if (ri->bits_per_channel == 16) { + out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, s->img_n, req_comp, s->img_x, s->img_y); + } else { + out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y); + } if (out == NULL) return out; // stbi__convert_format frees input on failure } return out; @@ -6842,6 +7571,8 @@ static int stbi__pnm_getinteger(stbi__context *s, char *c) while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) { value = value*10 + (*c - '0'); *c = (char) stbi__get8(s); + if((value > 214748364) || (value == 214748364 && *c > '7')) + return stbi__err("integer parse overflow", "Parsing an integer in the PPM header overflowed a 32-bit int"); } return value; @@ -6872,17 +7603,29 @@ static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp) stbi__pnm_skip_whitespace(s, &c); *x = stbi__pnm_getinteger(s, &c); // read width + if(*x == 0) + return stbi__err("invalid width", "PPM image header had zero or overflowing width"); stbi__pnm_skip_whitespace(s, &c); *y = stbi__pnm_getinteger(s, &c); // read height + if (*y == 0) + return stbi__err("invalid width", "PPM image header had zero or overflowing width"); stbi__pnm_skip_whitespace(s, &c); maxv = stbi__pnm_getinteger(s, &c); // read max value - - if (maxv > 255) - return stbi__err("max value > 255", "PPM image not 8-bit"); + if (maxv > 65535) + return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images"); + else if (maxv > 255) + return 16; else - return 1; + return 8; +} + +static int stbi__pnm_is16(stbi__context *s) +{ + if (stbi__pnm_info(s, NULL, NULL, NULL) == 16) + return 1; + return 0; } #endif @@ -6928,6 +7671,22 @@ static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp) return stbi__err("unknown image type", "Image not of any known type, or corrupt"); } +static int stbi__is_16_main(stbi__context *s) +{ + #ifndef STBI_NO_PNG + if (stbi__png_is16(s)) return 1; + #endif + + #ifndef STBI_NO_PSD + if (stbi__psd_is16(s)) return 1; + #endif + + #ifndef STBI_NO_PNM + if (stbi__pnm_is16(s)) return 1; + #endif + return 0; +} + #ifndef STBI_NO_STDIO STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp) { @@ -6949,6 +7708,27 @@ STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp) fseek(f,pos,SEEK_SET); return r; } + +STBIDEF int stbi_is_16_bit(char const *filename) +{ + FILE *f = stbi__fopen(filename, "rb"); + int result; + if (!f) return stbi__err("can't fopen", "Unable to open file"); + result = stbi_is_16_bit_from_file(f); + fclose(f); + return result; +} + +STBIDEF int stbi_is_16_bit_from_file(FILE *f) +{ + int r; + stbi__context s; + long pos = ftell(f); + stbi__start_file(&s, f); + r = stbi__is_16_main(&s); + fseek(f,pos,SEEK_SET); + return r; +} #endif // !STBI_NO_STDIO STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp) @@ -6965,10 +7745,31 @@ STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int return stbi__info_main(&s,x,y,comp); } +STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len) +{ + stbi__context s; + stbi__start_mem(&s,buffer,len); + return stbi__is_16_main(&s); +} + +STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user) +{ + stbi__context s; + stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user); + return stbi__is_16_main(&s); +} + #endif // STB_IMAGE_IMPLEMENTATION /* revision history: + 2.20 (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs + 2.19 (2018-02-11) fix warning + 2.18 (2018-01-30) fix warnings + 2.17 (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug + 1-bit BMP + *_is_16_bit api + avoid warnings 2.16 (2017-07-23) all functions have 16-bit variants; STBI_NO_STDIO works again; compilation fixes; diff --git a/src/stb_image_write.h b/3rdparty/stb/include/stb_image_write.h similarity index 73% rename from src/stb_image_write.h rename to 3rdparty/stb/include/stb_image_write.h index 9d553e0d7d0..e4b32ed1bc3 100644 --- a/src/stb_image_write.h +++ b/3rdparty/stb/include/stb_image_write.h @@ -1,4 +1,4 @@ -/* stb_image_write - v1.07 - public domain - http://nothings.org/stb/stb_image_write.h +/* stb_image_write - v1.16 - public domain - http://nothings.org/stb writes out PNG/BMP/TGA/JPEG/HDR images to C stdio - Sean Barrett 2010-2015 no warranty implied; use at your own risk @@ -12,32 +12,47 @@ ABOUT: - This header file is a library for writing images to C stdio. It could be - adapted to write to memory or a general streaming interface; let me know. + This header file is a library for writing images to C stdio or a callback. The PNG output is not optimal; it is 20-50% larger than the file - written by a decent optimizing implementation. This library is designed - for source code compactness and simplicity, not optimal image file size - or run-time performance. + written by a decent optimizing implementation; though providing a custom + zlib compress function (see STBIW_ZLIB_COMPRESS) can mitigate that. + This library is designed for source code compactness and simplicity, + not optimal image file size or run-time performance. BUILDING: You can #define STBIW_ASSERT(x) before the #include to avoid using assert.h. You can #define STBIW_MALLOC(), STBIW_REALLOC(), and STBIW_FREE() to replace malloc,realloc,free. - You can define STBIW_MEMMOVE() to replace memmove() + You can #define STBIW_MEMMOVE() to replace memmove() + You can #define STBIW_ZLIB_COMPRESS to use a custom zlib-style compress function + for PNG compression (instead of the builtin one), it must have the following signature: + unsigned char * my_compress(unsigned char *data, int data_len, int *out_len, int quality); + The returned data will be freed with STBIW_FREE() (free() by default), + so it must be heap allocated with STBIW_MALLOC() (malloc() by default), + +UNICODE: + + If compiling for Windows and you wish to use Unicode filenames, compile + with + #define STBIW_WINDOWS_UTF8 + and pass utf8-encoded filenames. Call stbiw_convert_wchar_to_utf8 to convert + Windows wchar_t filenames to utf8. USAGE: - There are four functions, one for each image file format: + There are five functions, one for each image file format: int stbi_write_png(char const *filename, int w, int h, int comp, const void *data, int stride_in_bytes); int stbi_write_bmp(char const *filename, int w, int h, int comp, const void *data); int stbi_write_tga(char const *filename, int w, int h, int comp, const void *data); + int stbi_write_jpg(char const *filename, int w, int h, int comp, const void *data, int quality); int stbi_write_hdr(char const *filename, int w, int h, int comp, const float *data); - int stbi_write_jpg(char const *filename, int w, int h, int comp, const float *data); - There are also four equivalent functions that use an arbitrary write function. You are + void stbi_flip_vertically_on_write(int flag); // flag is non-zero to flip data vertically + + There are also five equivalent functions that use an arbitrary write function. You are expected to open/close your file-equivalent before and after calling these: int stbi_write_png_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const void *data, int stride_in_bytes); @@ -49,6 +64,12 @@ where the callback is: void stbi_write_func(void *context, void *data, int size); + You can configure it with these global variables: + int stbi_write_tga_with_rle; // defaults to true; set to 0 to disable RLE + int stbi_write_png_compression_level; // defaults to 8; set to higher for more compression + int stbi_write_force_png_filter; // defaults to -1; set to 0..5 to force a filter mode + + You can define STBI_WRITE_NO_STDIO to disable the file variant of these functions, so the library will not use stdio.h at all. However, this will also disable HDR writing, because it requires stdio for formatted output. @@ -75,34 +96,33 @@ writer, both because it is in BGR order and because it may have padding at the end of the line.) + PNG allows you to set the deflate compression level by setting the global + variable 'stbi_write_png_compression_level' (it defaults to 8). + HDR expects linear float data. Since the format is always 32-bit rgb(e) data, alpha (if provided) is discarded, and for monochrome data it is replicated across all three channels. TGA supports RLE or non-RLE compressed data. To use non-RLE-compressed data, set the global variable 'stbi_write_tga_with_rle' to 0. - + JPEG does ignore alpha channels in input data; quality is between 1 and 100. Higher quality looks better but results in a bigger image. JPEG baseline (no JPEG progressive). CREDITS: - PNG/BMP/TGA - Sean Barrett - HDR - Baldur Karlsson - TGA monochrome: - Jean-Sebastien Guay - misc enhancements: - Tim Kelsey - TGA RLE - Alan Hickman - initial file IO callback implementation - Emmanuel Julien - JPEG - Jon Olick (original jo_jpeg.cpp code) - Daniel Gibson + + Sean Barrett - PNG/BMP/TGA + Baldur Karlsson - HDR + Jean-Sebastien Guay - TGA monochrome + Tim Kelsey - misc enhancements + Alan Hickman - TGA RLE + Emmanuel Julien - initial file IO callback implementation + Jon Olick - original jo_jpeg.cpp code + Daniel Gibson - integrate JPEG, allow external zlib + Aarni Koskela - allow choosing PNG filter + bugfixes: github:Chribba Guillaume Chereau @@ -114,7 +134,14 @@ Thatcher Ulrich github:poppolopoppo Patrick Boettcher - + github:xeekworx + Cap Petschulat + Simon Rodriguez + Ivan Tikhonov + github:ignotion + Adam Schackart + Andrew Kensler + LICENSE See end of file for license information. @@ -124,15 +151,25 @@ LICENSE #ifndef INCLUDE_STB_IMAGE_WRITE_H #define INCLUDE_STB_IMAGE_WRITE_H -#ifdef __cplusplus -extern "C" { -#endif +#include +// if STB_IMAGE_WRITE_STATIC causes problems, try defining STBIWDEF to 'inline' or 'static inline' +#ifndef STBIWDEF #ifdef STB_IMAGE_WRITE_STATIC -#define STBIWDEF static +#define STBIWDEF static +#else +#ifdef __cplusplus +#define STBIWDEF extern "C" #else -#define STBIWDEF extern -extern int stbi_write_tga_with_rle; +#define STBIWDEF extern +#endif +#endif +#endif + +#ifndef STB_IMAGE_WRITE_STATIC // C++ forbids static forward declarations +STBIWDEF int stbi_write_tga_with_rle; +STBIWDEF int stbi_write_png_compression_level; +STBIWDEF int stbi_write_force_png_filter; #endif #ifndef STBI_WRITE_NO_STDIO @@ -141,6 +178,10 @@ STBIWDEF int stbi_write_bmp(char const *filename, int w, int h, int comp, const STBIWDEF int stbi_write_tga(char const *filename, int w, int h, int comp, const void *data); STBIWDEF int stbi_write_hdr(char const *filename, int w, int h, int comp, const float *data); STBIWDEF int stbi_write_jpg(char const *filename, int x, int y, int comp, const void *data, int quality); + +#ifdef STBIW_WINDOWS_UTF8 +STBIWDEF int stbiw_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input); +#endif #endif typedef void stbi_write_func(void *context, void *data, int size); @@ -151,9 +192,7 @@ STBIWDEF int stbi_write_tga_to_func(stbi_write_func *func, void *context, int w, STBIWDEF int stbi_write_hdr_to_func(stbi_write_func *func, void *context, int w, int h, int comp, const float *data); STBIWDEF int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data, int quality); -#ifdef __cplusplus -} -#endif +STBIWDEF void stbi_flip_vertically_on_write(int flip_boolean); #endif//INCLUDE_STB_IMAGE_WRITE_H @@ -208,10 +247,29 @@ STBIWDEF int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, #define STBIW_UCHAR(x) (unsigned char) ((x) & 0xff) +#ifdef STB_IMAGE_WRITE_STATIC +static int stbi_write_png_compression_level = 8; +static int stbi_write_tga_with_rle = 1; +static int stbi_write_force_png_filter = -1; +#else +int stbi_write_png_compression_level = 8; +int stbi_write_tga_with_rle = 1; +int stbi_write_force_png_filter = -1; +#endif + +static int stbi__flip_vertically_on_write = 0; + +STBIWDEF void stbi_flip_vertically_on_write(int flag) +{ + stbi__flip_vertically_on_write = flag; +} + typedef struct { stbi_write_func *func; void *context; + unsigned char buffer[64]; + int buf_used; } stbi__write_context; // initialize a callback-based context @@ -228,9 +286,52 @@ static void stbi__stdio_write(void *context, void *data, int size) fwrite(data,1,size,(FILE*) context); } +#if defined(_WIN32) && defined(STBIW_WINDOWS_UTF8) +#ifdef __cplusplus +#define STBIW_EXTERN extern "C" +#else +#define STBIW_EXTERN extern +#endif +STBIW_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide); +STBIW_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default); + +STBIWDEF int stbiw_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input) +{ + return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL); +} +#endif + +static FILE *stbiw__fopen(char const *filename, char const *mode) +{ + FILE *f; +#if defined(_WIN32) && defined(STBIW_WINDOWS_UTF8) + wchar_t wMode[64]; + wchar_t wFilename[1024]; + if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename))) + return 0; + + if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode))) + return 0; + +#if defined(_MSC_VER) && _MSC_VER >= 1400 + if (0 != _wfopen_s(&f, wFilename, wMode)) + f = 0; +#else + f = _wfopen(wFilename, wMode); +#endif + +#elif defined(_MSC_VER) && _MSC_VER >= 1400 + if (0 != fopen_s(&f, filename, mode)) + f=0; +#else + f = fopen(filename, mode); +#endif + return f; +} + static int stbi__start_write_file(stbi__write_context *s, const char *filename) { - FILE *f = fopen(filename, "wb"); + FILE *f = stbiw__fopen(filename, "wb"); stbi__start_write_callbacks(s, stbi__stdio_write, (void *) f); return f != NULL; } @@ -245,12 +346,6 @@ static void stbi__end_write_file(stbi__write_context *s) typedef unsigned int stbiw_uint32; typedef int stb_image_write_test[sizeof(stbiw_uint32)==4 ? 1 : -1]; -#ifdef STB_IMAGE_WRITE_STATIC -static int stbi_write_tga_with_rle = 1; -#else -int stbi_write_tga_with_rle = 1; -#endif - static void stbiw__writefv(stbi__write_context *s, const char *fmt, va_list v) { while (*fmt) { @@ -288,16 +383,36 @@ static void stbiw__writef(stbi__write_context *s, const char *fmt, ...) va_end(v); } +static void stbiw__write_flush(stbi__write_context *s) +{ + if (s->buf_used) { + s->func(s->context, &s->buffer, s->buf_used); + s->buf_used = 0; + } +} + static void stbiw__putc(stbi__write_context *s, unsigned char c) { s->func(s->context, &c, 1); } +static void stbiw__write1(stbi__write_context *s, unsigned char a) +{ + if ((size_t)s->buf_used + 1 > sizeof(s->buffer)) + stbiw__write_flush(s); + s->buffer[s->buf_used++] = a; +} + static void stbiw__write3(stbi__write_context *s, unsigned char a, unsigned char b, unsigned char c) { - unsigned char arr[3]; - arr[0] = a, arr[1] = b, arr[2] = c; - s->func(s->context, arr, 3); + int n; + if ((size_t)s->buf_used + 3 > sizeof(s->buffer)) + stbiw__write_flush(s); + n = s->buf_used; + s->buf_used = n+3; + s->buffer[n+0] = a; + s->buffer[n+1] = b; + s->buffer[n+2] = c; } static void stbiw__write_pixel(stbi__write_context *s, int rgb_dir, int comp, int write_alpha, int expand_mono, unsigned char *d) @@ -306,7 +421,7 @@ static void stbiw__write_pixel(stbi__write_context *s, int rgb_dir, int comp, in int k; if (write_alpha < 0) - s->func(s->context, &d[comp - 1], 1); + stbiw__write1(s, d[comp - 1]); switch (comp) { case 2: // 2 pixels = mono + alpha, alpha is written separately, so same as 1-channel case @@ -314,7 +429,7 @@ static void stbiw__write_pixel(stbi__write_context *s, int rgb_dir, int comp, in if (expand_mono) stbiw__write3(s, d[0], d[0], d[0]); // monochrome bmp else - s->func(s->context, d, 1); // monochrome TGA + stbiw__write1(s, d[0]); // monochrome TGA break; case 4: if (!write_alpha) { @@ -330,7 +445,7 @@ static void stbiw__write_pixel(stbi__write_context *s, int rgb_dir, int comp, in break; } if (write_alpha > 0) - s->func(s->context, &d[comp - 1], 1); + stbiw__write1(s, d[comp - 1]); } static void stbiw__write_pixels(stbi__write_context *s, int rgb_dir, int vdir, int x, int y, int comp, void *data, int write_alpha, int scanline_pad, int expand_mono) @@ -341,16 +456,21 @@ static void stbiw__write_pixels(stbi__write_context *s, int rgb_dir, int vdir, i if (y <= 0) return; - if (vdir < 0) - j_end = -1, j = y-1; - else - j_end = y, j = 0; + if (stbi__flip_vertically_on_write) + vdir *= -1; + + if (vdir < 0) { + j_end = -1; j = y-1; + } else { + j_end = y; j = 0; + } for (; j != j_end; j += vdir) { for (i=0; i < x; ++i) { unsigned char *d = (unsigned char *) data + (j*x+i)*comp; stbiw__write_pixel(s, rgb_dir, comp, write_alpha, expand_mono, d); } + stbiw__write_flush(s); s->func(s->context, &zero, scanline_pad); } } @@ -371,16 +491,27 @@ static int stbiw__outfile(stbi__write_context *s, int rgb_dir, int vdir, int x, static int stbi_write_bmp_core(stbi__write_context *s, int x, int y, int comp, const void *data) { - int pad = (-x*3) & 3; - return stbiw__outfile(s,-1,-1,x,y,comp,1,(void *) data,0,pad, - "11 4 22 4" "4 44 22 444444", - 'B', 'M', 14+40+(x*3+pad)*y, 0,0, 14+40, // file header - 40, x,y, 1,24, 0,0,0,0,0,0); // bitmap header + if (comp != 4) { + // write RGB bitmap + int pad = (-x*3) & 3; + return stbiw__outfile(s,-1,-1,x,y,comp,1,(void *) data,0,pad, + "11 4 22 4" "4 44 22 444444", + 'B', 'M', 14+40+(x*3+pad)*y, 0,0, 14+40, // file header + 40, x,y, 1,24, 0,0,0,0,0,0); // bitmap header + } else { + // RGBA bitmaps need a v4 header + // use BI_BITFIELDS mode with 32bpp and alpha mask + // (straight BI_RGB with alpha mask doesn't work in most readers) + return stbiw__outfile(s,-1,-1,x,y,comp,1,(void *)data,1,0, + "11 4 22 4" "4 44 22 444444 4444 4 444 444 444 444", + 'B', 'M', 14+108+x*y*4, 0, 0, 14+108, // file header + 108, x,y, 1,32, 3,0,0,0,0,0, 0xff0000,0xff00,0xff,0xff000000u, 0, 0,0,0, 0,0,0, 0,0,0, 0,0,0); // bitmap V4 header + } } STBIWDEF int stbi_write_bmp_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data) { - stbi__write_context s; + stbi__write_context s = { 0 }; stbi__start_write_callbacks(&s, func, context); return stbi_write_bmp_core(&s, x, y, comp, data); } @@ -388,7 +519,7 @@ STBIWDEF int stbi_write_bmp_to_func(stbi_write_func *func, void *context, int x, #ifndef STBI_WRITE_NO_STDIO STBIWDEF int stbi_write_bmp(char const *filename, int x, int y, int comp, const void *data) { - stbi__write_context s; + stbi__write_context s = { 0 }; if (stbi__start_write_file(&s,filename)) { int r = stbi_write_bmp_core(&s, x, y, comp, data); stbi__end_write_file(&s); @@ -412,11 +543,21 @@ static int stbi_write_tga_core(stbi__write_context *s, int x, int y, int comp, v "111 221 2222 11", 0, 0, format, 0, 0, 0, 0, 0, x, y, (colorbytes + has_alpha) * 8, has_alpha * 8); } else { int i,j,k; + int jend, jdir; stbiw__writef(s, "111 221 2222 11", 0,0,format+8, 0,0,0, 0,0,x,y, (colorbytes + has_alpha) * 8, has_alpha * 8); - for (j = y - 1; j >= 0; --j) { - unsigned char *row = (unsigned char *) data + j * x * comp; + if (stbi__flip_vertically_on_write) { + j = 0; + jend = y; + jdir = 1; + } else { + j = y-1; + jend = -1; + jdir = -1; + } + for (; j != jend; j += jdir) { + unsigned char *row = (unsigned char *) data + j * x * comp; int len; for (i = 0; i < x; i += len) { @@ -451,24 +592,25 @@ static int stbi_write_tga_core(stbi__write_context *s, int x, int y, int comp, v if (diff) { unsigned char header = STBIW_UCHAR(len - 1); - s->func(s->context, &header, 1); + stbiw__write1(s, header); for (k = 0; k < len; ++k) { stbiw__write_pixel(s, -1, comp, has_alpha, 0, begin + k * comp); } } else { unsigned char header = STBIW_UCHAR(len - 129); - s->func(s->context, &header, 1); + stbiw__write1(s, header); stbiw__write_pixel(s, -1, comp, has_alpha, 0, begin); } } } + stbiw__write_flush(s); } return 1; } STBIWDEF int stbi_write_tga_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data) { - stbi__write_context s; + stbi__write_context s = { 0 }; stbi__start_write_callbacks(&s, func, context); return stbi_write_tga_core(&s, x, y, comp, (void *) data); } @@ -476,7 +618,7 @@ STBIWDEF int stbi_write_tga_to_func(stbi_write_func *func, void *context, int x, #ifndef STBI_WRITE_NO_STDIO STBIWDEF int stbi_write_tga(char const *filename, int x, int y, int comp, const void *data) { - stbi__write_context s; + stbi__write_context s = { 0 }; if (stbi__start_write_file(&s,filename)) { int r = stbi_write_tga_core(&s, x, y, comp, (void *) data); stbi__end_write_file(&s); @@ -492,7 +634,9 @@ STBIWDEF int stbi_write_tga(char const *filename, int x, int y, int comp, const #define stbiw__max(a, b) ((a) > (b) ? (a) : (b)) -void stbiw__linear_to_rgbe(unsigned char *rgbe, float *linear) +#ifndef STBI_WRITE_NO_STDIO + +static void stbiw__linear_to_rgbe(unsigned char *rgbe, float *linear) { int exponent; float maxcomp = stbiw__max(linear[0], stbiw__max(linear[1], linear[2])); @@ -509,7 +653,7 @@ void stbiw__linear_to_rgbe(unsigned char *rgbe, float *linear) } } -void stbiw__write_run_data(stbi__write_context *s, int length, unsigned char databyte) +static void stbiw__write_run_data(stbi__write_context *s, int length, unsigned char databyte) { unsigned char lengthbyte = STBIW_UCHAR(length+128); STBIW_ASSERT(length+128 <= 255); @@ -517,7 +661,7 @@ void stbiw__write_run_data(stbi__write_context *s, int length, unsigned char dat s->func(s->context, &databyte, 1); } -void stbiw__write_dump_data(stbi__write_context *s, int length, unsigned char *data) +static void stbiw__write_dump_data(stbi__write_context *s, int length, unsigned char *data) { unsigned char lengthbyte = STBIW_UCHAR(length); STBIW_ASSERT(length <= 128); // inconsistent with spec but consistent with official code @@ -525,7 +669,7 @@ void stbiw__write_dump_data(stbi__write_context *s, int length, unsigned char *d s->func(s->context, data, length); } -void stbiw__write_hdr_scanline(stbi__write_context *s, int width, int ncomp, unsigned char *scratch, float *scanline) +static void stbiw__write_hdr_scanline(stbi__write_context *s, int width, int ncomp, unsigned char *scratch, float *scanline) { unsigned char scanlineheader[4] = { 2, 2, 0, 0 }; unsigned char rgbe[4]; @@ -626,11 +770,15 @@ static int stbi_write_hdr_core(stbi__write_context *s, int x, int y, int comp, f char header[] = "#?RADIANCE\n# Written by stb_image_write.h\nFORMAT=32-bit_rle_rgbe\n"; s->func(s->context, header, sizeof(header)-1); +#ifdef __STDC_LIB_EXT1__ + len = sprintf_s(buffer, sizeof(buffer), "EXPOSURE= 1.0000000000000\n\n-Y %d +X %d\n", y, x); +#else len = sprintf(buffer, "EXPOSURE= 1.0000000000000\n\n-Y %d +X %d\n", y, x); +#endif s->func(s->context, buffer, len); for(i=0; i < y; i++) - stbiw__write_hdr_scanline(s, x, comp, scratch, data + comp*i*x); + stbiw__write_hdr_scanline(s, x, comp, scratch, data + comp*x*(stbi__flip_vertically_on_write ? y-1-i : i)); STBIW_FREE(scratch); return 1; } @@ -638,15 +786,14 @@ static int stbi_write_hdr_core(stbi__write_context *s, int x, int y, int comp, f STBIWDEF int stbi_write_hdr_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const float *data) { - stbi__write_context s; + stbi__write_context s = { 0 }; stbi__start_write_callbacks(&s, func, context); return stbi_write_hdr_core(&s, x, y, comp, (float *) data); } -#ifndef STBI_WRITE_NO_STDIO STBIWDEF int stbi_write_hdr(char const *filename, int x, int y, int comp, const float *data) { - stbi__write_context s; + stbi__write_context s = { 0 }; if (stbi__start_write_file(&s,filename)) { int r = stbi_write_hdr_core(&s, x, y, comp, (float *) data); stbi__end_write_file(&s); @@ -662,8 +809,9 @@ STBIWDEF int stbi_write_hdr(char const *filename, int x, int y, int comp, const // PNG writer // +#ifndef STBIW_ZLIB_COMPRESS // stretchy buffer; stbiw__sbpush() == vector<>::push_back() -- stbiw__sbcount() == vector<>::size() -#define stbiw__sbraw(a) ((int *) (a) - 2) +#define stbiw__sbraw(a) ((int *) (void *) (a) - 2) #define stbiw__sbm(a) stbiw__sbraw(a)[0] #define stbiw__sbn(a) stbiw__sbraw(a)[1] @@ -742,8 +890,14 @@ static unsigned int stbiw__zhash(unsigned char *data) #define stbiw__ZHASH 16384 -unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_len, int quality) +#endif // STBIW_ZLIB_COMPRESS + +STBIWDEF unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_len, int quality) { +#ifdef STBIW_ZLIB_COMPRESS + // user provided a zlib compress implementation, use that + return STBIW_ZLIB_COMPRESS(data, data_len, out_len, quality); +#else // use builtin static unsigned short lengthc[] = { 3,4,5,6,7,8,9,10,11,13,15,17,19,23,27,31,35,43,51,59,67,83,99,115,131,163,195,227,258, 259 }; static unsigned char lengtheb[]= { 0,0,0,0,0,0,0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 0 }; static unsigned short distc[] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577, 32768 }; @@ -751,7 +905,9 @@ unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_l unsigned int bitbuf=0; int i,j, bitcount=0; unsigned char *out = NULL; - unsigned char ***hash_table = (unsigned char***) STBIW_MALLOC(stbiw__ZHASH * sizeof(char**)); + unsigned char ***hash_table = (unsigned char***) STBIW_MALLOC(stbiw__ZHASH * sizeof(unsigned char**)); + if (hash_table == NULL) + return NULL; if (quality < 5) quality = 5; stbiw__sbpush(out, 0x78); // DEFLATE 32K window @@ -772,7 +928,7 @@ unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_l for (j=0; j < n; ++j) { if (hlist[j]-data > i-32768) { // if entry lies within window int d = stbiw__zlib_countm(hlist[j], data+i, data_len-i); - if (d >= best) best=d,bestloc=hlist[j]; + if (d >= best) { best=d; bestloc=hlist[j]; } } } // when hash table entry is too long, delete half the entries @@ -825,14 +981,31 @@ unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_l (void) stbiw__sbfree(hash_table[i]); STBIW_FREE(hash_table); + // store uncompressed instead if compression was worse + if (stbiw__sbn(out) > data_len + 2 + ((data_len+32766)/32767)*5) { + stbiw__sbn(out) = 2; // truncate to DEFLATE 32K window and FLEVEL = 1 + for (j = 0; j < data_len;) { + int blocklen = data_len - j; + if (blocklen > 32767) blocklen = 32767; + stbiw__sbpush(out, data_len - j == blocklen); // BFINAL = ?, BTYPE = 0 -- no compression + stbiw__sbpush(out, STBIW_UCHAR(blocklen)); // LEN + stbiw__sbpush(out, STBIW_UCHAR(blocklen >> 8)); + stbiw__sbpush(out, STBIW_UCHAR(~blocklen)); // NLEN + stbiw__sbpush(out, STBIW_UCHAR(~blocklen >> 8)); + memcpy(out+stbiw__sbn(out), data+j, blocklen); + stbiw__sbn(out) += blocklen; + j += blocklen; + } + } + { // compute adler32 on input unsigned int s1=1, s2=0; int blocklen = (int) (data_len % 5552); j=0; while (j < data_len) { - for (i=0; i < blocklen; ++i) s1 += data[j+i], s2 += s1; - s1 %= 65521, s2 %= 65521; + for (i=0; i < blocklen; ++i) { s1 += data[j+i]; s2 += s1; } + s1 %= 65521; s2 %= 65521; j += blocklen; blocklen = 5552; } @@ -845,10 +1018,14 @@ unsigned char * stbi_zlib_compress(unsigned char *data, int data_len, int *out_l // make returned pointer freeable STBIW_MEMMOVE(stbiw__sbraw(out), out, *out_len); return (unsigned char *) stbiw__sbraw(out); +#endif // STBIW_ZLIB_COMPRESS } static unsigned int stbiw__crc32(unsigned char *buffer, int len) { +#ifdef STBIW_CRC32 + return STBIW_CRC32(buffer, len); +#else static unsigned int crc_table[256] = { 0x00000000, 0x77073096, 0xEE0E612C, 0x990951BA, 0x076DC419, 0x706AF48F, 0xE963A535, 0x9E6495A3, @@ -890,6 +1067,7 @@ static unsigned int stbiw__crc32(unsigned char *buffer, int len) for (i=0; i < len; ++i) crc = (crc >> 8) ^ crc_table[buffer[i] ^ (crc & 0xff)]; return ~crc; +#endif } #define stbiw__wpng4(o,a,b,c,d) ((o)[0]=STBIW_UCHAR(a),(o)[1]=STBIW_UCHAR(b),(o)[2]=STBIW_UCHAR(c),(o)[3]=STBIW_UCHAR(d),(o)+=4) @@ -911,61 +1089,91 @@ static unsigned char stbiw__paeth(int a, int b, int c) } // @OPTIMIZE: provide an option that always forces left-predict or paeth predict -unsigned char *stbi_write_png_to_mem(unsigned char *pixels, int stride_bytes, int x, int y, int n, int *out_len) +static void stbiw__encode_png_line(unsigned char *pixels, int stride_bytes, int width, int height, int y, int n, int filter_type, signed char *line_buffer) { + static int mapping[] = { 0,1,2,3,4 }; + static int firstmap[] = { 0,1,0,5,6 }; + int *mymap = (y != 0) ? mapping : firstmap; + int i; + int type = mymap[filter_type]; + unsigned char *z = pixels + stride_bytes * (stbi__flip_vertically_on_write ? height-1-y : y); + int signed_stride = stbi__flip_vertically_on_write ? -stride_bytes : stride_bytes; + + if (type==0) { + memcpy(line_buffer, z, width*n); + return; + } + + // first loop isn't optimized since it's just one pixel + for (i = 0; i < n; ++i) { + switch (type) { + case 1: line_buffer[i] = z[i]; break; + case 2: line_buffer[i] = z[i] - z[i-signed_stride]; break; + case 3: line_buffer[i] = z[i] - (z[i-signed_stride]>>1); break; + case 4: line_buffer[i] = (signed char) (z[i] - stbiw__paeth(0,z[i-signed_stride],0)); break; + case 5: line_buffer[i] = z[i]; break; + case 6: line_buffer[i] = z[i]; break; + } + } + switch (type) { + case 1: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - z[i-n]; break; + case 2: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - z[i-signed_stride]; break; + case 3: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - ((z[i-n] + z[i-signed_stride])>>1); break; + case 4: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - stbiw__paeth(z[i-n], z[i-signed_stride], z[i-signed_stride-n]); break; + case 5: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - (z[i-n]>>1); break; + case 6: for (i=n; i < width*n; ++i) line_buffer[i] = z[i] - stbiw__paeth(z[i-n], 0,0); break; + } +} + +STBIWDEF unsigned char *stbi_write_png_to_mem(const unsigned char *pixels, int stride_bytes, int x, int y, int n, int *out_len) +{ + int force_filter = stbi_write_force_png_filter; int ctype[5] = { -1, 0, 4, 2, 6 }; unsigned char sig[8] = { 137,80,78,71,13,10,26,10 }; unsigned char *out,*o, *filt, *zlib; signed char *line_buffer; - int i,j,k,p,zlen; + int j,zlen; if (stride_bytes == 0) stride_bytes = x * n; + if (force_filter >= 5) { + force_filter = -1; + } + filt = (unsigned char *) STBIW_MALLOC((x*n+1) * y); if (!filt) return 0; line_buffer = (signed char *) STBIW_MALLOC(x * n); if (!line_buffer) { STBIW_FREE(filt); return 0; } for (j=0; j < y; ++j) { - static int mapping[] = { 0,1,2,3,4 }; - static int firstmap[] = { 0,1,0,5,6 }; - int *mymap = (j != 0) ? mapping : firstmap; - int best = 0, bestval = 0x7fffffff; - for (p=0; p < 2; ++p) { - for (k= p?best:0; k < 5; ++k) { // @TODO: clarity: rewrite this to go 0..5, and 'continue' the unwanted ones during 2nd pass - int type = mymap[k],est=0; - unsigned char *z = pixels + stride_bytes*j; - for (i=0; i < n; ++i) - switch (type) { - case 0: line_buffer[i] = z[i]; break; - case 1: line_buffer[i] = z[i]; break; - case 2: line_buffer[i] = z[i] - z[i-stride_bytes]; break; - case 3: line_buffer[i] = z[i] - (z[i-stride_bytes]>>1); break; - case 4: line_buffer[i] = (signed char) (z[i] - stbiw__paeth(0,z[i-stride_bytes],0)); break; - case 5: line_buffer[i] = z[i]; break; - case 6: line_buffer[i] = z[i]; break; - } - for (i=n; i < x*n; ++i) { - switch (type) { - case 0: line_buffer[i] = z[i]; break; - case 1: line_buffer[i] = z[i] - z[i-n]; break; - case 2: line_buffer[i] = z[i] - z[i-stride_bytes]; break; - case 3: line_buffer[i] = z[i] - ((z[i-n] + z[i-stride_bytes])>>1); break; - case 4: line_buffer[i] = z[i] - stbiw__paeth(z[i-n], z[i-stride_bytes], z[i-stride_bytes-n]); break; - case 5: line_buffer[i] = z[i] - (z[i-n]>>1); break; - case 6: line_buffer[i] = z[i] - stbiw__paeth(z[i-n], 0,0); break; - } - } - if (p) break; - for (i=0; i < x*n; ++i) + int filter_type; + if (force_filter > -1) { + filter_type = force_filter; + stbiw__encode_png_line((unsigned char*)(pixels), stride_bytes, x, y, j, n, force_filter, line_buffer); + } else { // Estimate the best filter by running through all of them: + int best_filter = 0, best_filter_val = 0x7fffffff, est, i; + for (filter_type = 0; filter_type < 5; filter_type++) { + stbiw__encode_png_line((unsigned char*)(pixels), stride_bytes, x, y, j, n, filter_type, line_buffer); + + // Estimate the entropy of the line using this filter; the less, the better. + est = 0; + for (i = 0; i < x*n; ++i) { est += abs((signed char) line_buffer[i]); - if (est < bestval) { bestval = est; best = k; } + } + if (est < best_filter_val) { + best_filter_val = est; + best_filter = filter_type; + } + } + if (filter_type != best_filter) { // If the last iteration already got us the best filter, don't redo it + stbiw__encode_png_line((unsigned char*)(pixels), stride_bytes, x, y, j, n, best_filter, line_buffer); + filter_type = best_filter; } } - // when we get here, best contains the filter type, and line_buffer contains the data - filt[j*(x*n+1)] = (unsigned char) best; + // when we get here, filter_type contains the filter type, and line_buffer contains the data + filt[j*(x*n+1)] = (unsigned char) filter_type; STBIW_MEMMOVE(filt+j*(x*n+1)+1, line_buffer, x*n); } STBIW_FREE(line_buffer); - zlib = stbi_zlib_compress(filt, y*( x*n+1), &zlen, 8); // increase 8 to get smaller but use more memory + zlib = stbi_zlib_compress(filt, y*( x*n+1), &zlen, stbi_write_png_compression_level); STBIW_FREE(filt); if (!zlib) return 0; @@ -1008,9 +1216,10 @@ STBIWDEF int stbi_write_png(char const *filename, int x, int y, int comp, const { FILE *f; int len; - unsigned char *png = stbi_write_png_to_mem((unsigned char *) data, stride_bytes, x, y, comp, &len); + unsigned char *png = stbi_write_png_to_mem((const unsigned char *) data, stride_bytes, x, y, comp, &len); if (png == NULL) return 0; - f = fopen(filename, "wb"); + + f = stbiw__fopen(filename, "wb"); if (!f) { STBIW_FREE(png); return 0; } fwrite(png, 1, len, f); fclose(f); @@ -1022,7 +1231,7 @@ STBIWDEF int stbi_write_png(char const *filename, int x, int y, int comp, const STBIWDEF int stbi_write_png_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data, int stride_bytes) { int len; - unsigned char *png = stbi_write_png_to_mem((unsigned char *) data, stride_bytes, x, y, comp, &len); + unsigned char *png = stbi_write_png_to_mem((const unsigned char *) data, stride_bytes, x, y, comp, &len); if (png == NULL) return 0; func(context, png, len); STBIW_FREE(png); @@ -1116,26 +1325,31 @@ static void stbiw__jpg_calcBits(int val, unsigned short bits[2]) { bits[0] = val & ((1< 100 ? 100 : quality; quality = quality < 50 ? 5000 / quality : 200 - quality * 2; @@ -1284,7 +1499,7 @@ static int stbi_write_jpg_core(stbi__write_context *s, int width, int height, in static const unsigned char head0[] = { 0xFF,0xD8,0xFF,0xE0,0,0x10,'J','F','I','F',0,1,1,0,0,1,0,1,0,0,0xFF,0xDB,0,0x84,0 }; static const unsigned char head2[] = { 0xFF,0xDA,0,0xC,3,1,0,2,0x11,3,0x11,0,0x3F,0 }; const unsigned char head1[] = { 0xFF,0xC0,0,0x11,8,(unsigned char)(height>>8),STBIW_UCHAR(height),(unsigned char)(width>>8),STBIW_UCHAR(width), - 3,1,0x11,0,2,0x11,1,3,0x11,1,0xFF,0xC4,0x01,0xA2,0 }; + 3,1,(unsigned char)(subsample?0x22:0x11),0,2,0x11,1,3,0x11,1,0xFF,0xC4,0x01,0xA2,0 }; s->func(s->context, (void*)head0, sizeof(head0)); s->func(s->context, (void*)YTable, sizeof(YTable)); stbiw__putc(s, 1); @@ -1307,38 +1522,74 @@ static int stbi_write_jpg_core(stbi__write_context *s, int width, int height, in // Encode 8x8 macroblocks { static const unsigned short fillBits[] = {0x7F, 7}; - const unsigned char *imageData = (const unsigned char *)data; int DCY=0, DCU=0, DCV=0; int bitBuf=0, bitCnt=0; // comp == 2 is grey+alpha (alpha is ignored) int ofsG = comp > 2 ? 1 : 0, ofsB = comp > 2 ? 2 : 0; + const unsigned char *dataR = (const unsigned char *)data; + const unsigned char *dataG = dataR + ofsG; + const unsigned char *dataB = dataR + ofsB; int x, y, pos; - for(y = 0; y < height; y += 8) { - for(x = 0; x < width; x += 8) { - float YDU[64], UDU[64], VDU[64]; - for(row = y, pos = 0; row < y+8; ++row) { - for(col = x; col < x+8; ++col, ++pos) { - int p = row*width*comp + col*comp; - float r, g, b; - if(row >= height) { - p -= width*comp*(row+1 - height); + if(subsample) { + for(y = 0; y < height; y += 16) { + for(x = 0; x < width; x += 16) { + float Y[256], U[256], V[256]; + for(row = y, pos = 0; row < y+16; ++row) { + // row >= height => use last input row + int clamped_row = (row < height) ? row : height - 1; + int base_p = (stbi__flip_vertically_on_write ? (height-1-clamped_row) : clamped_row)*width*comp; + for(col = x; col < x+16; ++col, ++pos) { + // if col >= width => use pixel from last input column + int p = base_p + ((col < width) ? col : (width-1))*comp; + float r = dataR[p], g = dataG[p], b = dataB[p]; + Y[pos]= +0.29900f*r + 0.58700f*g + 0.11400f*b - 128; + U[pos]= -0.16874f*r - 0.33126f*g + 0.50000f*b; + V[pos]= +0.50000f*r - 0.41869f*g - 0.08131f*b; } - if(col >= width) { - p -= comp*(col+1 - width); + } + DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+0, 16, fdtbl_Y, DCY, YDC_HT, YAC_HT); + DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+8, 16, fdtbl_Y, DCY, YDC_HT, YAC_HT); + DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+128, 16, fdtbl_Y, DCY, YDC_HT, YAC_HT); + DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y+136, 16, fdtbl_Y, DCY, YDC_HT, YAC_HT); + + // subsample U,V + { + float subU[64], subV[64]; + int yy, xx; + for(yy = 0, pos = 0; yy < 8; ++yy) { + for(xx = 0; xx < 8; ++xx, ++pos) { + int j = yy*32+xx*2; + subU[pos] = (U[j+0] + U[j+1] + U[j+16] + U[j+17]) * 0.25f; + subV[pos] = (V[j+0] + V[j+1] + V[j+16] + V[j+17]) * 0.25f; + } } - - r = imageData[p+0]; - g = imageData[p+ofsG]; - b = imageData[p+ofsB]; - YDU[pos]=+0.29900f*r+0.58700f*g+0.11400f*b-128; - UDU[pos]=-0.16874f*r-0.33126f*g+0.50000f*b; - VDU[pos]=+0.50000f*r-0.41869f*g-0.08131f*b; + DCU = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, subU, 8, fdtbl_UV, DCU, UVDC_HT, UVAC_HT); + DCV = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, subV, 8, fdtbl_UV, DCV, UVDC_HT, UVAC_HT); } } + } + } else { + for(y = 0; y < height; y += 8) { + for(x = 0; x < width; x += 8) { + float Y[64], U[64], V[64]; + for(row = y, pos = 0; row < y+8; ++row) { + // row >= height => use last input row + int clamped_row = (row < height) ? row : height - 1; + int base_p = (stbi__flip_vertically_on_write ? (height-1-clamped_row) : clamped_row)*width*comp; + for(col = x; col < x+8; ++col, ++pos) { + // if col >= width => use pixel from last input column + int p = base_p + ((col < width) ? col : (width-1))*comp; + float r = dataR[p], g = dataG[p], b = dataB[p]; + Y[pos]= +0.29900f*r + 0.58700f*g + 0.11400f*b - 128; + U[pos]= -0.16874f*r - 0.33126f*g + 0.50000f*b; + V[pos]= +0.50000f*r - 0.41869f*g - 0.08131f*b; + } + } - DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, YDU, fdtbl_Y, DCY, YDC_HT, YAC_HT); - DCU = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, UDU, fdtbl_UV, DCU, UVDC_HT, UVAC_HT); - DCV = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, VDU, fdtbl_UV, DCV, UVDC_HT, UVAC_HT); + DCY = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, Y, 8, fdtbl_Y, DCY, YDC_HT, YAC_HT); + DCU = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, U, 8, fdtbl_UV, DCU, UVDC_HT, UVAC_HT); + DCV = stbiw__jpg_processDU(s, &bitBuf, &bitCnt, V, 8, fdtbl_UV, DCV, UVDC_HT, UVAC_HT); + } } } @@ -1355,7 +1606,7 @@ static int stbi_write_jpg_core(stbi__write_context *s, int width, int height, in STBIWDEF int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, int y, int comp, const void *data, int quality) { - stbi__write_context s; + stbi__write_context s = { 0 }; stbi__start_write_callbacks(&s, func, context); return stbi_write_jpg_core(&s, x, y, comp, (void *) data, quality); } @@ -1364,7 +1615,7 @@ STBIWDEF int stbi_write_jpg_to_func(stbi_write_func *func, void *context, int x, #ifndef STBI_WRITE_NO_STDIO STBIWDEF int stbi_write_jpg(char const *filename, int x, int y, int comp, const void *data, int quality) { - stbi__write_context s; + stbi__write_context s = { 0 }; if (stbi__start_write_file(&s,filename)) { int r = stbi_write_jpg_core(&s, x, y, comp, data, quality); stbi__end_write_file(&s); @@ -1377,6 +1628,21 @@ STBIWDEF int stbi_write_jpg(char const *filename, int x, int y, int comp, const #endif // STB_IMAGE_WRITE_IMPLEMENTATION /* Revision history + 1.16 (2021-07-11) + make Deflate code emit uncompressed blocks when it would otherwise expand + support writing BMPs with alpha channel + 1.15 (2020-07-13) unknown + 1.14 (2020-02-02) updated JPEG writer to downsample chroma channels + 1.13 + 1.12 + 1.11 (2019-08-11) + + 1.10 (2019-02-07) + support utf8 filenames in Windows; fix warnings and platform ifdefs + 1.09 (2018-02-11) + fix typo in zlib quality API, improve STB_I_W_STATIC in C++ + 1.08 (2018-01-29) + add stbi__flip_vertically_on_write, external zlib, zlib quality, choose PNG filter 1.07 (2017-07-24) doc fix 1.06 (2017-07-23) @@ -1403,7 +1669,7 @@ STBIWDEF int stbi_write_jpg(char const *filename, int x, int y, int comp, const add HDR output fix monochrome BMP 0.95 (2014-08-17) - add monochrome TGA output + add monochrome TGA output 0.94 (2014-05-31) rename private functions to avoid conflicts with stb_image.h 0.93 (2014-05-27) @@ -1421,38 +1687,38 @@ This software is available under 2 licenses -- choose whichever you prefer. ------------------------------------------------------------------------------ ALTERNATIVE A - MIT License Copyright (c) 2017 Sean Barrett -Permission is hereby granted, free of charge, to any person obtaining a copy of -this software and associated documentation files (the "Software"), to deal in -the Software without restriction, including without limitation the rights to -use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies -of the Software, and to permit persons to whom the Software is furnished to do +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: -The above copyright notice and this permission notice shall be included in all +The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER -LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, -OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ------------------------------------------------------------------------------ ALTERNATIVE B - Public Domain (www.unlicense.org) This is free and unencumbered software released into the public domain. -Anyone is free to copy, modify, publish, use, compile, sell, or distribute this -software, either in source code form or as a compiled binary, for any purpose, +Anyone is free to copy, modify, publish, use, compile, sell, or distribute this +software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means. -In jurisdictions that recognize copyright laws, the author or authors of this -software dedicate any and all copyright interest in the software to the public -domain. We make this dedication for the benefit of the public at large and to -the detriment of our heirs and successors. We intend this dedication to be an -overt act of relinquishment in perpetuity of all present and future rights to +In jurisdictions that recognize copyright laws, the author or authors of this +software dedicate any and all copyright interest in the software to the public +domain. We make this dedication for the benefit of the public at large and to +the detriment of our heirs and successors. We intend this dedication to be an +overt act of relinquishment in perpetuity of all present and future rights to this software under copyright law. -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR -IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, -FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE -AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN -ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN +ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. ------------------------------------------------------------------------------ */ diff --git a/CCM b/CCM new file mode 160000 index 00000000000..20dc73dc42d --- /dev/null +++ b/CCM @@ -0,0 +1 @@ +Subproject commit 20dc73dc42d6fc10720fd40659618141d92d0c4c diff --git a/CMakeLists.txt b/CMakeLists.txt new file mode 100644 index 00000000000..a931a9d9b42 --- /dev/null +++ b/CMakeLists.txt @@ -0,0 +1,737 @@ +cmake_minimum_required(VERSION 3.19) +include(CMakeDependentOption) + +option(CMAKE_VERBOSE_MAKEFILE "Create verbose makefile" OFF) +option(CUDA_VERBOSE_BUILD "Create verbose CUDA build" OFF) +option(BUILD_SHARED_LIBS "Create dark as a shared library" ON) +option(BUILD_AS_CPP "Build Darknet using C++ compiler also for C files" OFF) +option(BUILD_USELIB_TRACK "Build uselib_track" ON) +option(MANUALLY_EXPORT_TRACK_OPTFLOW "Manually export the TRACK_OPTFLOW=1 define" OFF) +option(ENABLE_OPENCV "Enable OpenCV integration" OFF) +option(ENABLE_CUDA "Enable CUDA support" OFF) +cmake_dependent_option(ENABLE_CUDA_OPENGL_INTEGRATION "Build darknet with support for running networks straight from OpenGL textures" ON "ENABLE_CUDA" OFF) +option(ENABLE_CUDNN "Enable CUDNN" OFF) +option(ENABLE_CUDNN_HALF "Enable CUDNN Half precision" OFF) +option(ENABLE_ZED_CAMERA "Enable ZED Camera support" OFF) +option(ENABLE_VCPKG_INTEGRATION "Enable VCPKG integration" OFF) +option(ENABLE_DEPLOY_CUSTOM_CMAKE_MODULES "Copy custom CMake modules for downstream integration" OFF) +option(ENABLE_CSHARP_WRAPPER "Enable building a csharp wrapper" OFF) +option(ENABLE_INSTALLER "Enable building an installer" OFF) +option(VCPKG_BUILD_OPENCV_WITH_CUDA "Build OpenCV with CUDA extension integration" OFF) +option(VCPKG_USE_OPENCV2 "Use legacy OpenCV 2" OFF) +option(VCPKG_USE_OPENCV3 "Use legacy OpenCV 3" OFF) +option(VCPKG_USE_OPENCV4 "Use OpenCV 4" ON) +option(USE_NSIS "Use NSIS as a CPack backend on Windows" ON) +option(SKIP_INSTALL_RUNTIME_LIBS "Do not install runtime libs" OFF) + +if(DEFINED ENV{VCPKG_DEFAULT_TRIPLET}) + message(STATUS "Setting default vcpkg target triplet to $ENV{VCPKG_DEFAULT_TRIPLET}") + set(VCPKG_TARGET_TRIPLET $ENV{VCPKG_DEFAULT_TRIPLET}) +endif() + +if(VCPKG_USE_OPENCV4 AND VCPKG_USE_OPENCV2) + message(STATUS "You required vcpkg feature related to OpenCV 2 but forgot to turn off those for OpenCV 4, doing that for you") + set(VCPKG_USE_OPENCV4 OFF CACHE BOOL "Use OpenCV 4" FORCE) +endif() +if(VCPKG_USE_OPENCV4 AND VCPKG_USE_OPENCV3) + message(STATUS "You required vcpkg feature related to OpenCV 3 but forgot to turn off those for OpenCV 4, doing that for you") + set(VCPKG_USE_OPENCV4 OFF CACHE BOOL "Use OpenCV 4" FORCE) +endif() +if(VCPKG_USE_OPENCV2 AND VCPKG_USE_OPENCV3) + message(STATUS "You required vcpkg features related to both OpenCV 2 and OpenCV 3. Impossible to satisfy, keeping only OpenCV 3") + set(VCPKG_USE_OPENCV2 OFF CACHE BOOL "Use legacy OpenCV 2" FORCE) +endif() + +if(ENABLE_CUDA AND NOT APPLE) + list(APPEND VCPKG_MANIFEST_FEATURES "cuda") +endif() +if(ENABLE_CUDNN AND ENABLE_CUDA AND NOT APPLE) + list(APPEND VCPKG_MANIFEST_FEATURES "cudnn") +endif() +if(ENABLE_OPENCV) + if(VCPKG_BUILD_OPENCV_WITH_CUDA AND NOT APPLE) + if(VCPKG_USE_OPENCV4) + list(APPEND VCPKG_MANIFEST_FEATURES "opencv-cuda") + elseif(VCPKG_USE_OPENCV3) + list(APPEND VCPKG_MANIFEST_FEATURES "opencv3") + elseif(VCPKG_USE_OPENCV2) + list(APPEND VCPKG_MANIFEST_FEATURES "opencv2") + endif() + else() + if(VCPKG_USE_OPENCV4) + list(APPEND VCPKG_MANIFEST_FEATURES "opencv-base") + elseif(VCPKG_USE_OPENCV3) + list(APPEND VCPKG_MANIFEST_FEATURES "opencv3") + elseif(VCPKG_USE_OPENCV2) + list(APPEND VCPKG_MANIFEST_FEATURES "opencv2") + endif() + endif() +endif() + +if(CMAKE_SYSTEM_PROCESSOR MATCHES "^x86" OR CMAKE_SYSTEM_PROCESSOR MATCHES "^AMD64") + set(IS_X86 TRUE) +else() + set(IS_X86 FALSE) +endif() + +if(ENABLE_VCPKG_INTEGRATION AND DEFINED ENV{VCPKG_ROOT} AND NOT DEFINED CMAKE_TOOLCHAIN_FILE) + set(CMAKE_POLICY_DEFAULT_CMP0077 NEW) + set(X_VCPKG_APPLOCAL_DEPS_INSTALL ON) + set(CMAKE_TOOLCHAIN_FILE "$ENV{VCPKG_ROOT}/scripts/buildsystems/vcpkg.cmake" CACHE STRING "") + #set(_VCPKG_INSTALLED_DIR ${CMAKE_CURRENT_LIST_DIR}/vcpkg CACHE STRING "") #folder for manifest-installed dependencies + message(STATUS "VCPKG found: $ENV{VCPKG_ROOT}") + message(STATUS "Using VCPKG integration") + set(USE_INTEGRATED_LIBS "FALSE" CACHE BOOL "Use libs distributed with this repo") + if(VCPKG_MANIFEST_FEATURES) + message(STATUS "VCPKG_MANIFEST_FEATURES: ${VCPKG_MANIFEST_FEATURES}") + endif() +elseif(DEFINED CMAKE_TOOLCHAIN_FILE) + message(STATUS "Using toolchain: ${CMAKE_TOOLCHAIN_FILE}") + if(CMAKE_TOOLCHAIN_FILE MATCHES "vcpkg.cmake") + message(STATUS "Toolchain uses VCPKG integration") + if(VCPKG_MANIFEST_FEATURES) + message(STATUS "VCPKG_MANIFEST_FEATURES: ${VCPKG_MANIFEST_FEATURES}") + endif() + endif() + set(USE_INTEGRATED_LIBS "FALSE" CACHE BOOL "Use libs distributed with this repo") +elseif(WIN32) + message(STATUS "vcpkg not found, toolchain not defined, using integrated libs on win32") + set(USE_INTEGRATED_LIBS "TRUE" CACHE BOOL "Use libs distributed with this repo") +else() + message(WARNING "vcpkg not found, toolchain not defined, system not win32 so build might fail") + set(USE_INTEGRATED_LIBS "TRUE" CACHE BOOL "Use libs distributed with this repo") +endif() + +if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json) + file(READ ${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json VCPKG_JSON_STRING) + string(JSON VERSION_STRING GET ${VCPKG_JSON_STRING} version) +else() + set(VERSION_STRING "0.2.5.4") +endif() + +string(REPLACE "." ";" VERSION_LIST ${VERSION_STRING}) +list(LENGTH VERSION_LIST VERSION_LIST_LENGTH) +if(VERSION_LIST_LENGTH LESS 3) + message(FATAL_ERROR "Darknet needs at least major.minor.patch version numbers to properly configure") +endif() +list(GET VERSION_LIST 0 Darknet_MAJOR_VERSION) +list(GET VERSION_LIST 1 Darknet_MINOR_VERSION) +list(GET VERSION_LIST 2 Darknet_PATCH_VERSION) +if(VERSION_LIST_LENGTH GREATER 3) + list(GET VERSION_LIST 3 Darknet_TWEAK_VERSION) +else() + set(Darknet_TWEAK_VERSION 0) +endif() + +set(Darknet_VERSION ${Darknet_MAJOR_VERSION}.${Darknet_MINOR_VERSION}.${Darknet_PATCH_VERSION}.${Darknet_TWEAK_VERSION}) +message("Darknet_VERSION: ${Darknet_VERSION}") + +project(Darknet VERSION ${Darknet_VERSION}) + +enable_language(C) +enable_language(CXX) + +set(CMAKE_CXX_STANDARD 11) +if(USE_INTEGRATED_LIBS) + set(CMAKE_MODULE_PATH "${CMAKE_CURRENT_LIST_DIR}/cmake/Modules/" ${CMAKE_MODULE_PATH}) +endif() + +if(CMAKE_COMPILER_IS_GNUCC OR "${CMAKE_C_COMPILER_ID}" MATCHES "Clang" OR "${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang") + set(CMAKE_COMPILER_IS_GNUCC_OR_CLANG TRUE) + if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang" OR "${CMAKE_CXX_COMPILER_ID}" MATCHES "clang") + set(CMAKE_COMPILER_IS_CLANG TRUE) + else() + set(CMAKE_COMPILER_IS_CLANG FALSE) + endif() +else() + set(CMAKE_COMPILER_IS_GNUCC_OR_CLANG FALSE) + set(CMAKE_COMPILER_IS_CLANG FALSE) +endif() + +cmake_dependent_option(ENABLE_SSE_AND_AVX_FLAGS "Enable AVX and SSE optimizations (x86-only)" ON "CMAKE_COMPILER_IS_GNUCC_OR_CLANG;IS_X86" OFF) + +set(default_build_type "Release") +if(NOT CMAKE_BUILD_TYPE AND NOT CMAKE_CONFIGURATION_TYPES) + message(STATUS "Setting build type to '${default_build_type}' as none was specified.") + set(CMAKE_BUILD_TYPE "${default_build_type}" CACHE + STRING "Choose the type of build." FORCE) + # Set the possible values of build type for cmake-gui + set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS + "Debug" "Release" "MinSizeRel" "RelWithDebInfo") +endif() + +if(NOT ENABLE_INSTALLER) + if(CMAKE_INSTALL_PREFIX_INITIALIZED_TO_DEFAULT) + set(CMAKE_INSTALL_PREFIX "${CMAKE_CURRENT_LIST_DIR}" CACHE PATH "Install prefix" FORCE) + endif() + set(INSTALL_BIN_DIR "${CMAKE_CURRENT_LIST_DIR}" CACHE PATH "Path where exe and dll will be installed") + set(INSTALL_LIB_DIR "${CMAKE_CURRENT_LIST_DIR}" CACHE PATH "Path where lib will be installed") +else() + set(INSTALL_BIN_DIR "bin" CACHE PATH "Path where exe and dll will be installed") + set(INSTALL_LIB_DIR "lib" CACHE PATH "Path where lib will be installed") +endif() + +set(INSTALL_INCLUDE_DIR "include/darknet" CACHE PATH "Path where headers will be installed") +set(INSTALL_CMAKE_DIR "share/darknet" CACHE PATH "Path where cmake configs will be installed") + +find_library(MATH_LIBRARY m) + +if(ENABLE_CUDA) + include(CheckLanguage) + check_language(CUDA) + if(NOT CMAKE_CUDA_COMPILER) + message(STATUS "CUDA_PATH: $ENV{CUDA_PATH}") + message(STATUS "CUDACXX: $ENV{CUDACXX}") + message(FATAL_ERROR "CUDA not found, please build explicitly with -DENABLE_CUDA=OFF if you do not want CUDA.") + else() + enable_language(CUDA) + if(CMAKE_CUDA_COMPILER_VERSION VERSION_LESS "9.0") + message(STATUS "CUDA_PATH: $ENV{CUDA_PATH}") + message(STATUS "CUDACXX: $ENV{CUDACXX}") + message(FATAL_ERROR "Unsupported CUDA version, please upgrade to CUDA 9+ or disable CUDA with explicitly with -DENABLE_CUDA=OFF") + else() + if(CMAKE_VERSION VERSION_GREATER_EQUAL "3.23.0" AND NOT DEFINED CMAKE_CUDA_ARCHITECTURES) + set(CMAKE_CUDA_ARCHITECTURES all-major) + endif() + message(STATUS "Selected CMAKE_CUDA_ARCHITECTURES: ${CMAKE_CUDA_ARCHITECTURES}") + if("all-major" IN_LIST CMAKE_CUDA_ARCHITECTURES OR + "all" IN_LIST CMAKE_CUDA_ARCHITECTURES OR + 70 IN_LIST CMAKE_CUDA_ARCHITECTURES OR + 72 IN_LIST CMAKE_CUDA_ARCHITECTURES OR + 75 IN_LIST CMAKE_CUDA_ARCHITECTURES OR + 80 IN_LIST CMAKE_CUDA_ARCHITECTURES OR + 86 IN_LIST CMAKE_CUDA_ARCHITECTURES) + set(ENABLE_CUDNN_HALF "TRUE" CACHE BOOL "Enable CUDNN Half precision" FORCE) + message(STATUS "Your setup supports half precision (CUDA_ARCHITECTURES >= 70)") + else() + set(ENABLE_CUDNN_HALF "FALSE" CACHE BOOL "Enable CUDNN Half precision" FORCE) + message(STATUS "Your setup does not support half precision (it requires CUDA_ARCHITECTURES >= 70)") + endif() + endif() + if(BUILD_SHARED_LIBS) + set(CMAKE_CUDA_RUNTIME_LIBRARY "Shared") + else() + set(CMAKE_CUDA_RUNTIME_LIBRARY "Static") + endif() + endif() +endif() + +if(WIN32 AND ENABLE_CUDA AND CMAKE_MAKE_PROGRAM MATCHES "ninja") + option(SELECT_OPENCV_MODULES "Use only few selected OpenCV modules to circumvent 8192 char limit when using Ninja on Windows" ON) +else() + option(SELECT_OPENCV_MODULES "Use only few selected OpenCV modules to circumvent 8192 char limit when using Ninja on Windows" OFF) +endif() + +if(USE_INTEGRATED_LIBS) + set(PThreads4W_ROOT ${CMAKE_CURRENT_LIST_DIR}/3rdparty/pthreads CACHE PATH "Path where pthreads for windows can be located") + set(Stb_DIR ${CMAKE_CURRENT_LIST_DIR}/3rdparty/stb CACHE PATH "Path where Stb image library can be located") +endif() + +set(CMAKE_DEBUG_POSTFIX d) +set(CMAKE_THREAD_PREFER_PTHREAD ON) +find_package(Threads REQUIRED) +if(MSVC) + find_package(PThreads4W REQUIRED) +endif() +if(ENABLE_OPENCV) + find_package(OpenCV REQUIRED) + if(OpenCV_FOUND) + if(SELECT_OPENCV_MODULES) + if(TARGET opencv_world) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_world") + else() + if(TARGET opencv_core) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_core") + endif() + if(TARGET opencv_highgui) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_highgui") + endif() + if(TARGET opencv_imgproc) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_imgproc") + endif() + if(TARGET opencv_video) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_video") + endif() + if(TARGET opencv_videoio) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_videoio") + endif() + if(TARGET opencv_imgcodecs) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_imgcodecs") + endif() + if(TARGET opencv_text) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_text") + endif() + endif() + else() + list(APPEND OpenCV_LINKED_COMPONENTS ${OpenCV_LIBS}) + endif() + endif() +endif() +find_package(Stb REQUIRED) +find_package(OpenMP) + +if(APPLE AND NOT OPENMP_FOUND) + message(STATUS " -> To enable OpenMP on macOS, please install libomp from Homebrew") +endif() + +set(ADDITIONAL_CXX_FLAGS "-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -Wno-deprecated-declarations -Wno-write-strings") +set(ADDITIONAL_C_FLAGS "-Wall -Wno-unused-result -Wno-unknown-pragmas -Wfatal-errors -Wno-deprecated-declarations -Wno-write-strings") +if(UNIX AND BUILD_SHARED_LIBS AND NOT CMAKE_COMPILER_IS_CLANG) + set(SHAREDLIB_CXX_FLAGS "-Wl,-Bsymbolic") + set(SHAREDLIB_C_FLAGS "-Wl,-Bsymbolic") +endif() + +if(MSVC) + set(ADDITIONAL_CXX_FLAGS " /nologo /wd4013 /wd4018 /wd4028 /wd4047 /wd4068 /wd4090 /wd4101 /wd4113 /wd4133 /wd4190 /wd4244 /wd4267 /wd4305 /wd4477 /wd4996 /wd4819 /fp:fast") + set(ADDITIONAL_C_FLAGS " /nologo /wd4013 /wd4018 /wd4028 /wd4047 /wd4068 /wd4090 /wd4101 /wd4113 /wd4133 /wd4190 /wd4244 /wd4267 /wd4305 /wd4477 /wd4996 /wd4819 /fp:fast") + string(REGEX REPLACE "/O2" "/Ox" CMAKE_CXX_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE}) + string(REGEX REPLACE "/O2" "/Ox" CMAKE_C_FLAGS_RELEASE ${CMAKE_C_FLAGS_RELEASE}) +endif() + +if(CMAKE_COMPILER_IS_GNUCC_OR_CLANG) + if(CMAKE_COMPILER_IS_CLANG) + if(UNIX AND NOT APPLE) + set(CMAKE_CXX_FLAGS "-pthread ${CMAKE_CXX_FLAGS}") #force pthread to avoid bugs in some cmake setups + set(CMAKE_C_FLAGS "-pthread ${CMAKE_C_FLAGS}") + endif() + endif() + string(REGEX REPLACE "-O0" "-Og" CMAKE_CXX_FLAGS_DEBUG ${CMAKE_CXX_FLAGS_DEBUG}) + string(REGEX REPLACE "-O3" "-Ofast" CMAKE_CXX_FLAGS_RELEASE ${CMAKE_CXX_FLAGS_RELEASE}) + string(REGEX REPLACE "-O0" "-Og" CMAKE_C_FLAGS_DEBUG ${CMAKE_C_FLAGS_DEBUG}) + string(REGEX REPLACE "-O3" "-Ofast" CMAKE_C_FLAGS_RELEASE ${CMAKE_C_FLAGS_RELEASE}) + if(ENABLE_SSE_AND_AVX_FLAGS) + set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -ffp-contract=fast -mavx -mavx2 -msse3 -msse4.1 -msse4.2 -msse4a") + set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -ffp-contract=fast -mavx -mavx2 -msse3 -msse4.1 -msse4.2 -msse4a") + endif() +endif() + +set(CMAKE_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} ${SHAREDLIB_CXX_FLAGS} ${CMAKE_CXX_FLAGS}") +set(CMAKE_C_FLAGS "${ADDITIONAL_C_FLAGS} ${SHAREDLIB_C_FLAGS} ${CMAKE_C_FLAGS}") + +if(OpenCV_FOUND) + if(ENABLE_CUDA AND OpenCV_CUDA_VERSION) + if(TARGET opencv_cudaoptflow) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_cudaoptflow") + endif() + if(TARGET opencv_cudaimgproc) + list(APPEND OpenCV_LINKED_COMPONENTS "opencv_cudaimgproc") + endif() + elseif(ENABLE_CUDA AND NOT OpenCV_CUDA_VERSION) + set(BUILD_USELIB_TRACK "FALSE" CACHE BOOL "Build uselib_track" FORCE) + message(STATUS " -> darknet is fine for now, but uselib_track has been disabled!") + message(STATUS " -> Please rebuild OpenCV from sources with CUDA support to enable it") + else() + set(BUILD_USELIB_TRACK "FALSE" CACHE BOOL "Build uselib_track" FORCE) + endif() +endif() + +if(ENABLE_CUDA AND ENABLE_CUDNN) + find_package(CUDNN REQUIRED) +endif() + +if(ENABLE_CUDA) + if(MSVC) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} /DGPU") + + if(ENABLE_CUDA_OPENGL_INTEGRATION) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} /DCUDA_OPENGL_INTEGRATION") + endif() + + if(CUDNN_FOUND) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} /DCUDNN") + endif() + if(OpenCV_FOUND) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} /DOPENCV") + endif() + string(REPLACE " " "," ADDITIONAL_CXX_FLAGS_COMMA_SEPARATED "${ADDITIONAL_CXX_FLAGS}") + set(CUDA_HOST_COMPILER_FLAGS "-Wno-deprecated-declarations -Xcompiler=\"${ADDITIONAL_CXX_FLAGS_COMMA_SEPARATED}\"") + else() + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} -DGPU") + + if(ENABLE_CUDA_OPENGL_INTEGRATION) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} -DCUDA_OPENGL_INTEGRATION") + endif() + + if(CUDNN_FOUND) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} -DCUDNN") + endif() + if(OpenCV_FOUND) + set(ADDITIONAL_CXX_FLAGS "${ADDITIONAL_CXX_FLAGS} -DOPENCV") + endif() + if(APPLE) + set(CUDA_HOST_COMPILER_FLAGS "--compiler-options \" ${ADDITIONAL_CXX_FLAGS} -fPIC -Xpreprocessor -fopenmp -Ofast \"") + else() + set(CUDA_HOST_COMPILER_FLAGS "--compiler-options \" ${ADDITIONAL_CXX_FLAGS} -fPIC -fopenmp -Ofast \"") + endif() + endif() + + string (REPLACE ";" " " CUDA_ARCH_FLAGS_SPACE_SEPARATED "${CUDA_ARCH_FLAGS}") + set(CMAKE_CUDA_FLAGS "${CUDA_ARCH_FLAGS_SPACE_SEPARATED} ${CUDA_HOST_COMPILER_FLAGS} ${CMAKE_CUDA_FLAGS}") + message(STATUS "CMAKE_CUDA_FLAGS: ${CMAKE_CUDA_FLAGS}") +endif() + +if(ENABLE_CUDA AND ENABLE_ZED_CAMERA) + find_package(ZED 2 QUIET) + if(ZED_FOUND) + include_directories(${ZED_INCLUDE_DIRS}) + link_directories(${ZED_LIBRARY_DIR}) + message(STATUS "ZED SDK enabled") + else() + message(STATUS "ZED SDK not found") + set(ENABLE_ZED_CAMERA "FALSE" CACHE BOOL "Enable ZED Camera support" FORCE) + endif() +else() + if(ENABLE_ZED_CAMERA) + message(STATUS "ZED SDK not enabled, since it requires CUDA") + endif() + set(ENABLE_ZED_CAMERA "FALSE" CACHE BOOL "Enable ZED Camera support" FORCE) +endif() + +foreach(p LIB BIN INCLUDE CMAKE) + set(var INSTALL_${p}_DIR) + if(NOT IS_ABSOLUTE "${${var}}") + set(FULLPATH_${var} "${CMAKE_INSTALL_PREFIX}/${${var}}") + endif() +endforeach() + +configure_file( + "${CMAKE_CURRENT_LIST_DIR}/src/version.h.in" + "${CMAKE_CURRENT_LIST_DIR}/src/version.h" +) + +#look for all *.h files in src folder +file(GLOB headers "${CMAKE_CURRENT_LIST_DIR}/src/*.h") +#add also files in the include folder +list(APPEND headers + ${CMAKE_CURRENT_LIST_DIR}/include/darknet.h +) +#remove windows only files +if(NOT MSVC) + list(REMOVE_ITEM headers + ${CMAKE_CURRENT_LIST_DIR}/src/gettimeofday.h + ) +endif() +#set(exported_headers ${headers}) + +#look for all *.c files in src folder +file(GLOB sources "${CMAKE_CURRENT_LIST_DIR}/src/*.c") +#add also .cpp files +list(APPEND sources + ${CMAKE_CURRENT_LIST_DIR}/src/http_stream.cpp + ${CMAKE_CURRENT_LIST_DIR}/src/image_opencv.cpp +) +#remove darknet.c file which is necessary only for the executable, not for the lib +list(REMOVE_ITEM sources + ${CMAKE_CURRENT_LIST_DIR}/src/darknet.c +) +#remove windows only files +if(NOT MSVC) + list(REMOVE_ITEM sources + ${CMAKE_CURRENT_LIST_DIR}/src/gettimeofday.c + ) +endif() + +if(USE_INTEGRATED_LIBS AND MSVC) + list(APPEND sources + ${CMAKE_CURRENT_LIST_DIR}/3rdparty/getopt/getopt.c + ${CMAKE_CURRENT_LIST_DIR}/3rdparty/getopt/getopt.h + ) +endif() +if((NOT USE_INTEGRATED_LIBS) AND MSVC) + find_package(unofficial-getopt-win32 REQUIRED) +endif() + +if(ENABLE_CUDA) + file(GLOB cuda_sources "${CMAKE_CURRENT_LIST_DIR}/src/*.cu") +endif() + +if(BUILD_AS_CPP) + set_source_files_properties(${sources} PROPERTIES LANGUAGE CXX) +endif() + +add_library(dark ${CMAKE_CURRENT_LIST_DIR}/include/yolo_v2_class.hpp ${CMAKE_CURRENT_LIST_DIR}/src/yolo_v2_class.cpp ${sources} ${headers} ${cuda_sources}) +set_target_properties(dark PROPERTIES POSITION_INDEPENDENT_CODE ON) +if(ENABLE_CUDA) + set_target_properties(dark PROPERTIES CUDA_SEPARABLE_COMPILATION ON) +endif() +if(BUILD_SHARED_LIBS) + target_compile_definitions(dark PRIVATE LIB_EXPORTS=1) +endif() +if(BUILD_AS_CPP) + set_target_properties(dark PROPERTIES LINKER_LANGUAGE CXX) +endif() +set_target_properties(dark PROPERTIES OUTPUT_NAME "darknet") + +if(OpenCV_FOUND AND OpenCV_VERSION VERSION_GREATER "3.0" AND BUILD_USELIB_TRACK) + add_executable(uselib_track ${CMAKE_CURRENT_LIST_DIR}/src/yolo_console_dll.cpp) +endif() + +add_executable(uselib ${CMAKE_CURRENT_LIST_DIR}/src/yolo_console_dll.cpp) +if(BUILD_AS_CPP) + set_target_properties(uselib PROPERTIES LINKER_LANGUAGE CXX) +endif() + +add_executable(darknet ${CMAKE_CURRENT_LIST_DIR}/src/darknet.c ${sources} ${headers} ${cuda_sources}) +if(WIN32 AND NOT MINGW) + set_target_properties(darknet PROPERTIES PDB_NAME "darknet.exe" IMPORT_SUFFIX ".exe.lib") +endif() +if(BUILD_AS_CPP) + set_source_files_properties(${CMAKE_CURRENT_LIST_DIR}/src/darknet.c PROPERTIES LANGUAGE CXX) + set_target_properties(darknet PROPERTIES LINKER_LANGUAGE CXX) +endif() +if(MSVC) + target_sources(darknet PRIVATE ${CMAKE_CURRENT_LIST_DIR}/src/darknet.rc) +endif() + +add_executable(kmeansiou ${CMAKE_CURRENT_LIST_DIR}/scripts/kmeansiou.c) +if(MATH_LIBRARY) + target_link_libraries(kmeansiou PRIVATE ${MATH_LIBRARY}) +endif() + +target_include_directories(darknet PUBLIC $ $ $ $) +target_include_directories(dark PUBLIC $ $ $ $) +target_include_directories(uselib PUBLIC $ $ $ $) + +target_compile_definitions(darknet PRIVATE -DUSE_CMAKE_LIBS) +target_compile_definitions(dark PRIVATE -DUSE_CMAKE_LIBS) +target_compile_definitions(uselib PRIVATE -DUSE_CMAKE_LIBS) + +if(OpenCV_FOUND AND OpenCV_VERSION VERSION_GREATER "3.0" AND BUILD_USELIB_TRACK AND NOT MANUALLY_EXPORT_TRACK_OPTFLOW) + target_compile_definitions(dark PUBLIC TRACK_OPTFLOW=1) +endif() + +if(CUDNN_FOUND) + target_link_libraries(darknet PRIVATE CuDNN::CuDNN) + target_link_libraries(dark PRIVATE CuDNN::CuDNN) + target_compile_definitions(darknet PRIVATE -DCUDNN) + target_compile_definitions(dark PUBLIC -DCUDNN) + if(ENABLE_CUDNN_HALF) + target_compile_definitions(darknet PRIVATE -DCUDNN_HALF) + target_compile_definitions(dark PUBLIC -DCUDNN_HALF) + endif() +endif() + +if(OpenCV_FOUND) + target_link_libraries(darknet PRIVATE ${OpenCV_LINKED_COMPONENTS}) + target_link_libraries(uselib PRIVATE ${OpenCV_LINKED_COMPONENTS}) + target_link_libraries(dark PUBLIC ${OpenCV_LINKED_COMPONENTS}) + target_include_directories(dark PRIVATE ${OpenCV_INCLUDE_DIRS}) + target_compile_definitions(darknet PRIVATE -DOPENCV) + target_compile_definitions(dark PUBLIC -DOPENCV) + + if(OpenCV_VERSION VERSION_LESS "4.0.0") + find_package(Freetype REQUIRED) + # Link FreeType library to resolve undefined symbols in HarfBuzz, Cairo, and FontConfig + if(TARGET Freetype::Freetype) + target_link_libraries(darknet PRIVATE Freetype::Freetype) + target_link_libraries(dark PUBLIC Freetype::Freetype) + target_link_libraries(uselib PRIVATE Freetype::Freetype) + elseif(TARGET freetype) + target_link_libraries(darknet PRIVATE freetype) + target_link_libraries(dark PUBLIC freetype) + target_link_libraries(uselib PRIVATE freetype) + endif() + endif() + +endif() + +if(OPENMP_FOUND) + target_link_libraries(darknet PRIVATE OpenMP::OpenMP_CXX) + target_link_libraries(darknet PRIVATE OpenMP::OpenMP_C) + target_link_libraries(dark PUBLIC OpenMP::OpenMP_CXX) + target_link_libraries(dark PUBLIC OpenMP::OpenMP_C) +endif() + +if(CMAKE_COMPILER_IS_GNUCC AND MATH_LIBRARY) + target_link_libraries(darknet PRIVATE ${MATH_LIBRARY}) + target_link_libraries(dark PUBLIC ${MATH_LIBRARY}) +endif() + +if(MSVC) + target_link_libraries(darknet PRIVATE PThreads4W::PThreads4W) + target_link_libraries(darknet PRIVATE wsock32) + target_link_libraries(dark PUBLIC PThreads4W::PThreads4W) + target_link_libraries(dark PUBLIC wsock32) + target_link_libraries(uselib PRIVATE PThreads4W::PThreads4W) + if(USE_INTEGRATED_LIBS) + target_include_directories(dark PRIVATE ${CMAKE_CURRENT_LIST_DIR}/3rdparty/getopt) + target_include_directories(darknet PRIVATE ${CMAKE_CURRENT_LIST_DIR}/3rdparty/getopt) + else() + target_link_libraries(dark PRIVATE unofficial::getopt-win32::getopt) + target_link_libraries(darknet PRIVATE unofficial::getopt-win32::getopt) + endif() + target_compile_definitions(darknet PRIVATE -D_CRT_RAND_S -DNOMINMAX -D_USE_MATH_DEFINES) + target_compile_definitions(dark PRIVATE -D_CRT_RAND_S -DNOMINMAX -D_USE_MATH_DEFINES) + target_compile_definitions(dark PUBLIC -D_CRT_SECURE_NO_WARNINGS) + target_compile_definitions(uselib PRIVATE -D_CRT_RAND_S -DNOMINMAX -D_USE_MATH_DEFINES) +endif() + +if(MSVC OR MINGW) + target_link_libraries(darknet PRIVATE ws2_32) + target_link_libraries(dark PUBLIC ws2_32) +endif() + +target_link_libraries(darknet PRIVATE Threads::Threads) +target_link_libraries(dark PUBLIC Threads::Threads) +target_link_libraries(uselib PRIVATE Threads::Threads) + +if(ENABLE_ZED_CAMERA) + target_link_libraries(darknet PRIVATE ${ZED_LIBRARIES}) + target_link_libraries(dark PUBLIC ${ZED_LIBRARIES}) + target_link_libraries(uselib PRIVATE ${ZED_LIBRARIES}) + target_compile_definitions(darknet PRIVATE -DZED_STEREO) + target_compile_definitions(uselib PRIVATE -DZED_STEREO) + target_compile_definitions(dark PUBLIC -DZED_STEREO) +endif() + +if(ENABLE_CUDA) + target_include_directories(darknet PRIVATE ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}) + target_include_directories(dark PUBLIC ${CMAKE_CUDA_TOOLKIT_INCLUDE_DIRECTORIES}) + target_link_libraries(darknet PRIVATE curand cublas cuda) + target_link_libraries(dark PRIVATE curand cublas cuda) + set_target_properties(dark PROPERTIES CUDA_RESOLVE_DEVICE_SYMBOLS ON) + target_compile_definitions(darknet PRIVATE -DGPU) + target_compile_definitions(dark PUBLIC -DGPU) +endif() + +if(ENABLE_CUDA_OPENGL_INTEGRATION) + target_compile_definitions(darknet PRIVATE -DCUDA_OPENGL_INTEGRATION) + target_compile_definitions(dark PUBLIC -DCUDA_OPENGL_INTEGRATION) +endif() + +if(USE_INTEGRATED_LIBS AND WIN32) + target_compile_definitions(darknet PRIVATE -D_TIMESPEC_DEFINED) + target_compile_definitions(dark PRIVATE -D_TIMESPEC_DEFINED) +endif() + +target_link_libraries(uselib PRIVATE dark) +if(OpenCV_FOUND AND OpenCV_VERSION VERSION_GREATER "3.0" AND BUILD_USELIB_TRACK) + target_link_libraries(uselib_track PRIVATE dark) + target_compile_definitions(uselib_track PRIVATE TRACK_OPTFLOW=1) + target_compile_definitions(uselib_track PRIVATE -DUSE_CMAKE_LIBS) + if(BUILD_AS_CPP) + set_target_properties(uselib_track PROPERTIES LINKER_LANGUAGE CXX) + endif() + target_include_directories(uselib_track PRIVATE ${CMAKE_CURRENT_LIST_DIR}/include) + target_link_libraries(uselib_track PRIVATE ${OpenCV_LINKED_COMPONENTS}) + if(ENABLE_ZED_CAMERA) + target_link_libraries(uselib_track PRIVATE ${ZED_LIBRARIES}) + target_compile_definitions(uselib_track PRIVATE -DZED_STEREO) + endif() + if(MSVC) + target_link_libraries(uselib_track PRIVATE PThreads4W::PThreads4W) + target_compile_definitions(uselib_track PRIVATE -D_CRT_RAND_S -DNOMINMAX -D_USE_MATH_DEFINES) + endif() + target_link_libraries(uselib_track PRIVATE Threads::Threads) +endif() + +#set_target_properties(dark PROPERTIES PUBLIC_HEADER "${exported_headers};${CMAKE_CURRENT_LIST_DIR}/include/yolo_v2_class.hpp") +set_target_properties(dark PROPERTIES PUBLIC_HEADER "${CMAKE_CURRENT_LIST_DIR}/include/darknet.h;${CMAKE_CURRENT_LIST_DIR}/include/yolo_v2_class.hpp") + +set_target_properties(dark PROPERTIES CXX_VISIBILITY_PRESET hidden) + +install(TARGETS dark EXPORT DarknetTargets + RUNTIME DESTINATION "${INSTALL_BIN_DIR}" + LIBRARY DESTINATION "${INSTALL_LIB_DIR}" + ARCHIVE DESTINATION "${INSTALL_LIB_DIR}" + PUBLIC_HEADER DESTINATION "${INSTALL_INCLUDE_DIR}" + COMPONENT dev +) +install(TARGETS uselib darknet kmeansiou + DESTINATION "${INSTALL_BIN_DIR}" +) +if(OpenCV_FOUND AND OpenCV_VERSION VERSION_GREATER "3.0" AND BUILD_USELIB_TRACK) + install(TARGETS uselib_track + DESTINATION "${INSTALL_BIN_DIR}" + ) +endif() + +install(EXPORT DarknetTargets + FILE DarknetTargets.cmake + NAMESPACE Darknet:: + DESTINATION "${INSTALL_CMAKE_DIR}" +) + +# Export the package for use from the build-tree (this registers the build-tree with a global CMake-registry) +export(PACKAGE Darknet) + +# Create the DarknetConfig.cmake +# First of all we compute the relative path between the cmake config file and the include path +file(RELATIVE_PATH REL_INCLUDE_DIR "${FULLPATH_INSTALL_CMAKE_DIR}" "${FULLPATH_INSTALL_INCLUDE_DIR}") +set(CONF_INCLUDE_DIRS "${PROJECT_SOURCE_DIR}" "${PROJECT_BINARY_DIR}") +configure_file(DarknetConfig.cmake.in "${PROJECT_BINARY_DIR}/DarknetConfig.cmake" @ONLY) +set(CONF_INCLUDE_DIRS "\${Darknet_CMAKE_DIR}/${REL_INCLUDE_DIR}") +configure_file(DarknetConfig.cmake.in "${PROJECT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/DarknetConfig.cmake" @ONLY) + +# Create the DarknetConfigVersion.cmake +include(CMakePackageConfigHelpers) +write_basic_package_version_file("${PROJECT_BINARY_DIR}/DarknetConfigVersion.cmake" + COMPATIBILITY SameMajorVersion +) + +install(FILES + "${PROJECT_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/DarknetConfig.cmake" + "${PROJECT_BINARY_DIR}/DarknetConfigVersion.cmake" + DESTINATION "${INSTALL_CMAKE_DIR}" +) + +if (ENABLE_DEPLOY_CUSTOM_CMAKE_MODULES) + install(FILES + "${CMAKE_CURRENT_LIST_DIR}/cmake/Modules/FindCUDNN.cmake" + "${CMAKE_CURRENT_LIST_DIR}/cmake/Modules/FindPThreads4W.cmake" + DESTINATION "${INSTALL_CMAKE_DIR}" + ) +endif() + +if(ENABLE_CSHARP_WRAPPER) + add_subdirectory(src/csharp) +endif() + +if (NOT SKIP_INSTALL_RUNTIME_LIBS) + set(CMAKE_INSTALL_SYSTEM_RUNTIME_LIBS_SKIP TRUE) + include(InstallRequiredSystemLibraries) + install( + PROGRAMS ${CMAKE_INSTALL_SYSTEM_RUNTIME_LIBS} + DESTINATION ${INSTALL_BIN_DIR} + ) +endif() + +if(EXISTS ${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json) + file(READ ${CMAKE_CURRENT_SOURCE_DIR}/vcpkg.json VCPKG_JSON_STRING) + string(JSON CPACK_PACKAGE_NAME GET ${VCPKG_JSON_STRING} name) + string(JSON CPACK_PACKAGE_HOMEPAGE_URL GET ${VCPKG_JSON_STRING} homepage) + string(JSON CPACK_PACKAGE_DESCRIPTION GET ${VCPKG_JSON_STRING} description) + set(CPACK_RESOURCE_FILE_LICENSE ${CMAKE_CURRENT_SOURCE_DIR}/LICENSE) + + if(UNIX AND NOT APPLE) + find_program(LSB_RELEASE_EXEC lsb_release) + execute_process(COMMAND ${LSB_RELEASE_EXEC} -is + OUTPUT_VARIABLE LSB_RELEASE_ID_SHORT + OUTPUT_STRIP_TRAILING_WHITESPACE + ) + if(LSB_RELEASE_ID_SHORT STREQUAL "Ubuntu") + set(CPACK_GENERATOR "DEB") + set(CPACK_DEBIAN_PACKAGE_MAINTAINER "Darknet") + else() + set(CPACK_GENERATOR "RPM") + endif() + elseif(APPLE) + set(CPACK_GENERATOR "DragNDrop") + elseif(WIN32) + set(CPACK_PACKAGE_INSTALL_DIRECTORY ${CPACK_PACKAGE_NAME}) + if(USE_NSIS) + set(CPACK_GENERATOR "NSIS") + string(JSON CPACK_NSIS_PACKAGE_NAME GET ${VCPKG_JSON_STRING} name) + string(JSON CPACK_NSIS_DISPLAY_NAME GET ${VCPKG_JSON_STRING} name) + set(CPACK_NSIS_ENABLE_UNINSTALL_BEFORE_INSTALL "ON") + set(CPACK_NSIS_MODIFY_PATH OFF) #disable extra page for adding to PATH, because it's broken on Win10+ due to NSIS not supporting MAX_PATH + set(CPACK_NSIS_MUI_ICON "${CMAKE_CURRENT_SOURCE_DIR}/src/darknet.ico") + set(CPACK_NSIS_MUI_UNIICON "${CMAKE_CURRENT_SOURCE_DIR}/src/darknet.ico") + else() + set(CPACK_GENERATOR "WIX") + #set(CPACK_WIX_UPGRADE_GUID "") # IMPORTANT! It has to be unique for every project!! + endif() + endif() + + include(CPack) +endif() diff --git a/DarknetConfig.cmake.in b/DarknetConfig.cmake.in new file mode 100644 index 00000000000..ab6388516f3 --- /dev/null +++ b/DarknetConfig.cmake.in @@ -0,0 +1,50 @@ +# Config file for the Darknet package + +get_filename_component(Darknet_CMAKE_DIR "${CMAKE_CURRENT_LIST_FILE}" PATH) +list(APPEND CMAKE_MODULE_PATH "${Darknet_CMAKE_DIR}") + +include(CMakeFindDependencyMacro) + +if(@OpenCV_FOUND@) + find_dependency(OpenCV) +endif() + +if(@ENABLE_CUDA@) + include(CheckLanguage) + check_language(CUDA) + if(NOT CMAKE_CUDA_COMPILER) + message(STATUS " --> WARNING: Unable to find native CUDA integration!") + endif() + if(@CUDNN_FOUND@) + find_dependency(CUDNN) + endif() +endif() + +set(CMAKE_THREAD_PREFER_PTHREAD ON) +find_dependency(Threads) + +if(MSVC) + find_dependency(PThreads4W) + set(CMAKE_CXX_FLAGS "/wd4018 /wd4244 /wd4267 /wd4305 ${CMAKE_CXX_FLAGS}") + if(@unofficial-getopt-win32_FOUND@) + find_dependency(unofficial-getopt-win32) + endif() +endif() + +if(@OPENMP_FOUND@) + find_dependency(OpenMP) +endif() + +# Our library dependencies (contains definitions for IMPORTED targets) +include("${Darknet_CMAKE_DIR}/DarknetTargets.cmake") +include("${Darknet_CMAKE_DIR}/DarknetConfigVersion.cmake") + +if(@OpenCV_FOUND@) + set_property(TARGET Darknet::dark APPEND PROPERTY INTERFACE_INCLUDE_DIRECTORIES "${OpenCV_INCLUDE_DIRS}") +endif() + +get_target_property(FULL_DARKNET_INCLUDE_DIRS Darknet::dark INTERFACE_INCLUDE_DIRECTORIES) +list(GET FULL_DARKNET_INCLUDE_DIRS 0 Darknet_INCLUDE_DIR) +get_filename_component(Darknet_INCLUDE_DIR "${Darknet_INCLUDE_DIR}" REALPATH) + +find_package_handle_standard_args(Darknet REQUIRED_VARS Darknet_INCLUDE_DIR VERSION_VAR PACKAGE_VERSION) diff --git a/Dockerfile.cpu b/Dockerfile.cpu new file mode 100644 index 00000000000..db433f29b25 --- /dev/null +++ b/Dockerfile.cpu @@ -0,0 +1,47 @@ +FROM ubuntu:latest AS builder + +ENV DEBIAN_FRONTEND noninteractive + +RUN apt-get update -y + +RUN apt-get install -y g++ make pkg-config libopencv-dev + +COPY . /darknet + +WORKDIR /darknet + +RUN rm Dockerfile.cpu + +RUN rm Dockerfile.gpu + +RUN rm docker-compose.yml + +RUN make + +FROM ubuntu:latest + +ENV DEBIAN_FRONTEND noninteractive + +RUN apt-get update -y + +RUN apt-get install -y sudo libgomp1 + +RUN useradd -U -m yolo + +RUN usermod -aG sudo yolo + +RUN usermod --shell /bin/bash yolo + +RUN echo "yolo:yolo" | chpasswd + +COPY --from=builder /darknet /home/yolo/darknet + +RUN cp /home/yolo/darknet/libdarknet.so /usr/local/lib/libdarknet.so || echo "libso not used" + +RUN cp /home/yolo/darknet/include/darknet.h /usr/local/include/darknet.h + +RUN ldconfig + +WORKDIR /home/yolo/darknet + +USER yolo diff --git a/Dockerfile.gpu b/Dockerfile.gpu new file mode 100644 index 00000000000..f1985fbcfc0 --- /dev/null +++ b/Dockerfile.gpu @@ -0,0 +1,47 @@ +FROM nvidia/cuda:11.6.0-cudnn8-devel-ubuntu20.04 AS builder + +ENV DEBIAN_FRONTEND noninteractive + +RUN apt-get update -y + +RUN apt-get install -y g++ make pkg-config libopencv-dev + +COPY . /darknet + +WORKDIR /darknet + +RUN rm Dockerfile.cpu + +RUN rm Dockerfile.gpu + +RUN rm docker-compose.yml + +RUN make + +FROM nvidia/cuda:11.6.1-cudnn8-devel-ubuntu20.04 + +ENV DEBIAN_FRONTEND noninteractive + +RUN apt-get update -y + +RUN apt-get install -y sudo libgomp1 + +RUN useradd -U -m yolo + +RUN usermod -aG sudo yolo + +RUN usermod --shell /bin/bash yolo + +RUN echo "yolo:yolo" | chpasswd + +COPY --from=builder /darknet /home/yolo/darknet + +RUN cp /home/yolo/darknet/libdarknet.so /usr/local/lib/libdarknet.so || echo "libso not used" + +RUN cp /home/yolo/darknet/include/darknet.h /usr/local/include/darknet.h + +RUN ldconfig + +WORKDIR /home/yolo/darknet + +USER yolo diff --git a/Makefile b/Makefile index 5c39df1f554..167d071585b 100644 --- a/Makefile +++ b/Makefile @@ -5,73 +5,127 @@ OPENCV=0 AVX=0 OPENMP=0 LIBSO=0 +ZED_CAMERA=0 +ZED_CAMERA_v2_8=0 # set GPU=1 and CUDNN=1 to speedup on GPU -# set CUDNN_HALF=1 to further speedup 3 x times (Mixed-precision using Tensor Cores) on GPU Tesla V100, Titan V, DGX-2 +# set CUDNN_HALF=1 to further speedup 3 x times (Mixed-precision on Tensor Cores) GPU: Volta, Xavier, Turing, Ampere, Ada and higher # set AVX=1 and OPENMP=1 to speedup on CPU (if error occurs then set AVX=0) +# set ZED_CAMERA=1 to enable ZED SDK 3.0 and above +# set ZED_CAMERA_v2_8=1 to enable ZED SDK 2.X +USE_CPP=0 DEBUG=0 -ARCH= -gencode arch=compute_30,code=sm_30 \ - -gencode arch=compute_35,code=sm_35 \ - -gencode arch=compute_50,code=[sm_50,compute_50] \ +ARCH= -gencode arch=compute_50,code=[sm_50,compute_50] \ -gencode arch=compute_52,code=[sm_52,compute_52] \ - -gencode arch=compute_61,code=[sm_61,compute_61] + -gencode arch=compute_61,code=[sm_61,compute_61] OS := $(shell uname) -# Tesla V100 -# ARCH= -gencode arch=compute_70,code=[sm_70,compute_70] +# Naming confusion with recent RTX cards. +# "NVIDIA Quadro RTX x000" and T1000/Tx00 are Turing Architecture Family with Compute Capability of 7.5 +# "NVIDIA RTX Ax000" are Ampere Architecture Family with Compute Capability of 8.6 +# NVIDIA "RTX x000 Ada" are Ada Lovelace Architecture Family with Compute Capability of 8.9 +# Source https://developer.nvidia.com/cuda-gpus + +# KEPLER, GeForce GTX 770, GTX 760, GT 740 +# ARCH= -gencode arch=compute_30,code=sm_30 + +# MAXWELL, GeForce GTX 950, 960, 970, 980, 980 Ti, "GTX" Titan X +# ARCH= -gencode arch=compute_52,code=sm_52 -# GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4 -# ARCH= -gencode arch=compute_61,code=sm_61 -gencode arch=compute_61,code=compute_61 +# Jetson TX1, Tegra X1, DRIVE CX, DRIVE PX, Jetson Nano (2GB, 4GB) +# ARCH= -gencode arch=compute_53,code=[sm_53,compute_53] -# GP100/Tesla P100 – DGX-1 +# GP100/Tesla P100 - DGX-1 # ARCH= -gencode arch=compute_60,code=sm_60 -# For Jetson Tx1 uncomment: -# ARCH= -gencode arch=compute_51,code=[sm_51,compute_51] +# PASCAL, GTX 10x0, GTX 10x0 Ti, Titan Xp, Tesla P40, Tesla P4 +# ARCH= -gencode arch=compute_61,code=[sm_61,compute_61] -# For Jetson Tx2 or Drive-PX2 uncomment: +# For Jetson TX2, Jetson Nano TX2 or Drive-PX2 uncomment: # ARCH= -gencode arch=compute_62,code=[sm_62,compute_62] +# Tesla V100 +# ARCH= -gencode arch=compute_70,code=[sm_70,compute_70] + +# Jetson XAVIER, XAVIER NX +# ARCH= -gencode arch=compute_72,code=[sm_72,compute_72] + +# GeForce Titan RTX, RTX 20x0, RTX 20x0 Ti, Quadro RTX x000, Tesla T4, XNOR Tensor Cores +# ARCH= -gencode arch=compute_75,code=[sm_75,compute_75] + +# Tesla A100 (GA100), DGX-A100, A30, A100, RTX 3080 +# ARCH= -gencode arch=compute_80,code=[sm_80,compute_80] + +# GeForce RTX 30x0, 30x0 Ti, Tesla GA10x, RTX Axxxx, A2, A10, A16, A40 +# ARCH= -gencode arch=compute_86,code=[sm_86,compute_86] + +# NOT TESTED, THEORETICAL +# Jetson ORIN, ORIN NX, ORIN NANO +# ARCH= -gencode arch=compute_87,code=[sm_87,compute_87] + +# NOT TESTED, THEORETICAL +# GeForce RTX 4070 Ti, 4080, 4090, L4, L40 +# ARCH= -gencode arch=compute_89,code=[sm_89,compute_89] + +# NOT TESTED, THEORETICAL +# Nvidia H100 +# ARCH= -gencode arch=compute_90,code=[sm_90,compute_90] VPATH=./src/ EXEC=darknet OBJDIR=./obj/ ifeq ($(LIBSO), 1) -LIBNAMESO=darknet.so +LIBNAMESO=libdarknet.so APPNAMESO=uselib endif +ifeq ($(USE_CPP), 1) +CC=g++ +else CC=gcc -CPP=g++ -NVCC=nvcc -OPTS=-Ofast -LDFLAGS= -lm -pthread -COMMON= -CFLAGS=-Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas +endif -ifeq ($(DEBUG), 1) -OPTS= -O0 -g +CPP=g++ -std=c++11 +NVCC=nvcc +OPTS=-Ofast +LDFLAGS= -lm -pthread +COMMON= -Iinclude/ -I3rdparty/stb/include +CFLAGS=-Wall -Wfatal-errors -Wno-unused-result -Wno-unknown-pragmas -fPIC -rdynamic + +ifeq ($(DEBUG), 1) +#OPTS= -O0 -g +#OPTS= -Og -g +COMMON+= -DDEBUG +CFLAGS+= -DDEBUG else -ifeq ($(AVX), 1) -CFLAGS+= -ffp-contract=fast -mavx -msse4.1 -msse4a +ifeq ($(AVX), 1) +CFLAGS+= -ffp-contract=fast -mavx -mavx2 -msse3 -msse4.1 -msse4.2 -msse4a endif endif CFLAGS+=$(OPTS) -ifeq ($(OPENCV), 1) +ifneq (,$(findstring MSYS_NT,$(OS))) +LDFLAGS+=-lws2_32 +endif + +ifeq ($(OPENCV), 1) COMMON+= -DOPENCV CFLAGS+= -DOPENCV -LDFLAGS+= `pkg-config --libs opencv` -COMMON+= `pkg-config --cflags opencv` +LDFLAGS+= `pkg-config --libs opencv4 2> /dev/null || pkg-config --libs opencv` +COMMON+= `pkg-config --cflags opencv4 2> /dev/null || pkg-config --cflags opencv` endif ifeq ($(OPENMP), 1) -CFLAGS+= -fopenmp + ifeq ($(OS),Darwin) #MAC + CFLAGS+= -Xpreprocessor -fopenmp + else + CFLAGS+= -fopenmp + endif LDFLAGS+= -lgomp endif @@ -102,48 +156,60 @@ CFLAGS+= -DCUDNN_HALF ARCH+= -gencode arch=compute_70,code=[sm_70,compute_70] endif -OBJ=http_stream.o gemm.o utils.o cuda.o convolutional_layer.o list.o image.o activations.o im2col.o col2im.o blas.o crop_layer.o dropout_layer.o maxpool_layer.o softmax_layer.o data.o matrix.o network.o connected_layer.o cost_layer.o parser.o option_list.o darknet.o detection_layer.o captcha.o route_layer.o writing.o box.o nightmare.o normalization_layer.o avgpool_layer.o coco.o dice.o yolo.o detector.o layer.o compare.o classifier.o local_layer.o swag.o shortcut_layer.o activation_layer.o rnn_layer.o gru_layer.o rnn.o rnn_vid.o crnn_layer.o demo.o tag.o cifar.o go.o batchnorm_layer.o art.o region_layer.o reorg_layer.o reorg_old_layer.o super.o voxel.o tree.o yolo_layer.o upsample_layer.o -ifeq ($(GPU), 1) -LDFLAGS+= -lstdc++ +ifeq ($(ZED_CAMERA), 1) +CFLAGS+= -DZED_STEREO -I/usr/local/zed/include +ifeq ($(ZED_CAMERA_v2_8), 1) +LDFLAGS+= -L/usr/local/zed/lib -lsl_core -lsl_input -lsl_zed +#-lstdc++ -D_GLIBCXX_USE_CXX11_ABI=0 +else +LDFLAGS+= -L/usr/local/zed/lib -lsl_zed +#-lstdc++ -D_GLIBCXX_USE_CXX11_ABI=0 +endif +endif + +OBJ=image_opencv.o http_stream.o gemm.o utils.o dark_cuda.o convolutional_layer.o list.o image.o activations.o im2col.o col2im.o blas.o crop_layer.o dropout_layer.o maxpool_layer.o softmax_layer.o data.o matrix.o network.o connected_layer.o cost_layer.o parser.o option_list.o darknet.o detection_layer.o captcha.o route_layer.o writing.o box.o nightmare.o normalization_layer.o avgpool_layer.o coco.o dice.o yolo.o detector.o layer.o compare.o classifier.o local_layer.o swag.o shortcut_layer.o representation_layer.o activation_layer.o rnn_layer.o gru_layer.o rnn.o rnn_vid.o crnn_layer.o demo.o tag.o cifar.o go.o batchnorm_layer.o art.o region_layer.o reorg_layer.o reorg_old_layer.o super.o voxel.o tree.o yolo_layer.o gaussian_yolo_layer.o upsample_layer.o lstm_layer.o conv_lstm_layer.o scale_channels_layer.o sam_layer.o +ifeq ($(GPU), 1) +LDFLAGS+= -lstdc++ OBJ+=convolutional_kernels.o activation_kernels.o im2col_kernels.o col2im_kernels.o blas_kernels.o crop_layer_kernels.o dropout_layer_kernels.o maxpool_layer_kernels.o network_kernels.o avgpool_layer_kernels.o endif OBJS = $(addprefix $(OBJDIR), $(OBJ)) -DEPS = $(wildcard src/*.h) Makefile +DEPS = $(wildcard src/*.h) Makefile include/darknet.h -all: obj backup results $(EXEC) $(LIBNAMESO) $(APPNAMESO) +all: $(OBJDIR) backup results setchmod $(EXEC) $(LIBNAMESO) $(APPNAMESO) -ifeq ($(LIBSO), 1) +ifeq ($(LIBSO), 1) CFLAGS+= -fPIC -$(LIBNAMESO): $(OBJS) src/yolo_v2_class.hpp src/yolo_v2_class.cpp - $(CPP) -shared -std=c++11 -fvisibility=hidden -DYOLODLL_EXPORTS $(COMMON) $(CFLAGS) $(OBJS) src/yolo_v2_class.cpp -o $@ $(LDFLAGS) - -$(APPNAMESO): $(LIBNAMESO) src/yolo_v2_class.hpp src/yolo_console_dll.cpp +$(LIBNAMESO): $(OBJDIR) $(OBJS) include/yolo_v2_class.hpp src/yolo_v2_class.cpp + $(CPP) -shared -std=c++11 -fvisibility=hidden -DLIB_EXPORTS $(COMMON) $(CFLAGS) $(OBJS) src/yolo_v2_class.cpp -o $@ $(LDFLAGS) + +$(APPNAMESO): $(LIBNAMESO) include/yolo_v2_class.hpp src/yolo_console_dll.cpp $(CPP) -std=c++11 $(COMMON) $(CFLAGS) -o $@ src/yolo_console_dll.cpp $(LDFLAGS) -L ./ -l:$(LIBNAMESO) endif $(EXEC): $(OBJS) - $(CPP) $(COMMON) $(CFLAGS) $^ -o $@ $(LDFLAGS) + $(CPP) -std=c++11 $(COMMON) $(CFLAGS) $^ -o $@ $(LDFLAGS) $(OBJDIR)%.o: %.c $(DEPS) $(CC) $(COMMON) $(CFLAGS) -c $< -o $@ $(OBJDIR)%.o: %.cpp $(DEPS) - $(CPP) $(COMMON) $(CFLAGS) -c $< -o $@ + $(CPP) -std=c++11 $(COMMON) $(CFLAGS) -c $< -o $@ $(OBJDIR)%.o: %.cu $(DEPS) $(NVCC) $(ARCH) $(COMMON) --compiler-options "$(CFLAGS)" -c $< -o $@ -obj: - mkdir -p obj +$(OBJDIR): + mkdir -p $(OBJDIR) backup: mkdir -p backup results: mkdir -p results +setchmod: + chmod +x *.sh .PHONY: clean clean: rm -rf $(OBJS) $(EXEC) $(LIBNAMESO) $(APPNAMESO) - diff --git a/README.md b/README.md index 0dc62301ba7..fa11fc21488 100644 --- a/README.md +++ b/README.md @@ -1,273 +1,532 @@ -# Yolo-v3 and Yolo-v2 for Windows and Linux -### (neural network for object detection) +# Yolo v4, v3 and v2 for Windows and Linux -[![CircleCI](https://circleci.com/gh/AlexeyAB/darknet.svg?style=svg)](https://circleci.com/gh/AlexeyAB/darknet) +* Read the FAQ: https://www.ccoderun.ca/programming/darknet_faq/ +* Join the Darknet/YOLO Discord: https://discord.gg/zSq8rtW +* Recommended GitHub repo for Darknet/YOLO: https://github.com/hank-ai/darknetcv/ +* Hank.ai and Darknet/YOLO: https://hank.ai/darknet-welcomes-hank-ai-as-official-sponsor-and-commercial-entity/ + +## (neural networks for object detection) + +* Paper **YOLOv7**: https://arxiv.org/abs/2207.02696 + +* source code YOLOv7 - Pytorch (use to reproduce results): https://github.com/WongKinYiu/yolov7 + +---- + +* Paper **YOLOv4**: https://arxiv.org/abs/2004.10934 + +* source code YOLOv4 - Darknet (use to reproduce results): https://github.com/AlexeyAB/darknet + +---- + +* Paper **Scaled-YOLOv4 (CVPR 2021)**: https://openaccess.thecvf.com/content/CVPR2021/html/Wang_Scaled-YOLOv4_Scaling_Cross_Stage_Partial_Network_CVPR_2021_paper.html + +* source code Scaled-YOLOv4 - Pytorch (use to reproduce results): https://github.com/WongKinYiu/ScaledYOLOv4 + +---- + +### YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors + +* **Paper**: https://arxiv.org/abs/2207.02696 + +* **source code - Pytorch (use to reproduce results):** https://github.com/WongKinYiu/yolov7 + + +YOLOv7 is more accurate and faster than YOLOv5 by **120%** FPS, than YOLOX by **180%** FPS, than Dual-Swin-T by **1200%** FPS, than ConvNext by **550%** FPS, than SWIN-L by **500%** FPS, than PPYOLOE-X by **150%** FPS. + +YOLOv7 surpasses all known object detectors in both speed and accuracy in the range from 5 FPS to 160 FPS and has the highest accuracy 56.8% AP among all known real-time object detectors with 30 FPS or higher on GPU V100, batch=1. + +* YOLOv7-e6 (55.9% AP, 56 FPS V100 b=1) by `+500%` FPS faster than SWIN-L C-M-RCNN (53.9% AP, 9.2 FPS A100 b=1) +* YOLOv7-e6 (55.9% AP, 56 FPS V100 b=1) by `+550%` FPS faster than ConvNeXt-XL C-M-RCNN (55.2% AP, 8.6 FPS A100 b=1) +* YOLOv7-w6 (54.6% AP, 84 FPS V100 b=1) by `+120%` FPS faster than YOLOv5-X6-r6.1 (55.0% AP, 38 FPS V100 b=1) +* YOLOv7-w6 (54.6% AP, 84 FPS V100 b=1) by `+1200%` FPS faster than Dual-Swin-T C-M-RCNN (53.6% AP, 6.5 FPS V100 b=1) +* YOLOv7x (52.9% AP, 114 FPS V100 b=1) by `+150%` FPS faster than PPYOLOE-X (51.9% AP, 45 FPS V100 b=1) +* YOLOv7 (51.2% AP, 161 FPS V100 b=1) by `+180%` FPS faster than YOLOX-X (51.1% AP, 58 FPS V100 b=1) + + +---- + +![more5](https://user-images.githubusercontent.com/4096485/179425274-f55a36d4-8450-4471-816b-8c105841effd.jpg) + +---- + +![image](https://user-images.githubusercontent.com/4096485/177675030-a929ee00-0eba-4d93-95c2-225231d0fd61.png) + + +---- + +More details in articles on medium: + +- [Scaled_YOLOv4](https://alexeyab84.medium.com/scaled-yolo-v4-is-the-best-neural-network-for-object-detection-on-ms-coco-dataset-39dfa22fa982?source=friends_link&sk=c8553bfed861b1a7932f739d26f487c8) +- [YOLOv4](https://medium.com/@alexeyab84/yolov4-the-most-accurate-real-time-neural-network-on-ms-coco-dataset-73adfd3602fe?source=friends_link&sk=6039748846bbcf1d960c3061542591d7) + +Manual: https://github.com/AlexeyAB/darknet/wiki -1. [How to use](#how-to-use) -2. [How to compile on Linux](#how-to-compile-on-linux) -3. [How to compile on Windows](#how-to-compile-on-windows) -4. [How to train (Pascal VOC Data)](#how-to-train-pascal-voc-data) -5. [How to train (to detect your custom objects)](#how-to-train-to-detect-your-custom-objects) -6. [When should I stop training](#when-should-i-stop-training) -7. [How to calculate mAP on PascalVOC 2007](#how-to-calculate-map-on-pascalvoc-2007) -8. [How to improve object detection](#how-to-improve-object-detection) -9. [How to mark bounded boxes of objects and create annotation files](#how-to-mark-bounded-boxes-of-objects-and-create-annotation-files) -10. [Using Yolo9000](#using-yolo9000) -11. [How to use Yolo as DLL](#how-to-use-yolo-as-dll) +Discussion: +- [Discord](https://discord.gg/zSq8rtW) +About Darknet framework: http://pjreddie.com/darknet/ -| ![Darknet Logo](http://pjreddie.com/media/files/darknet-black-small.png) |   ![map_fps](https://hsto.org/webt/pw/zd/0j/pwzd0jb9g7znt_dbsyw9qzbnvti.jpeg) mAP (AP50) https://pjreddie.com/media/files/papers/YOLOv3.pdf | +[![Darknet Continuous Integration](https://github.com/AlexeyAB/darknet/workflows/Darknet%20Continuous%20Integration/badge.svg)](https://github.com/AlexeyAB/darknet/actions?query=workflow%3A%22Darknet+Continuous+Integration%22) +[![CircleCI](https://circleci.com/gh/AlexeyAB/darknet.svg?style=svg)](https://circleci.com/gh/AlexeyAB/darknet) +[![Contributors](https://img.shields.io/github/contributors/AlexeyAB/Darknet.svg)](https://github.com/AlexeyAB/darknet/graphs/contributors) +[![License: Unlicense](https://img.shields.io/badge/license-Unlicense-blue.svg)](https://github.com/AlexeyAB/darknet/blob/master/LICENSE) +[![DOI](https://zenodo.org/badge/75388965.svg)](https://zenodo.org/badge/latestdoi/75388965) +[![arxiv.org](http://img.shields.io/badge/cs.CV-arXiv%3A2004.10934-B31B1B.svg)](https://arxiv.org/abs/2004.10934) +[![arxiv.org](http://img.shields.io/badge/cs.CV-arXiv%3A2011.08036-B31B1B.svg)](https://arxiv.org/abs/2011.08036) +[![colab](https://user-images.githubusercontent.com/4096485/86174089-b2709f80-bb29-11ea-9faf-3d8dc668a1a5.png)](https://colab.research.google.com/drive/12QusaaRj_lUwCGDvQNfICpa7kA7_a2dE) +[![colab](https://user-images.githubusercontent.com/4096485/86174097-b56b9000-bb29-11ea-9240-c17f6bacfc34.png)](https://colab.research.google.com/drive/1_GdoqCJWXsChrOiY8sZMr_zbr_fH-0Fg) + +- [YOLOv4 model zoo](https://github.com/AlexeyAB/darknet/wiki/YOLOv4-model-zoo) +- [Requirements (and how to install dependencies)](#requirements-for-windows-linux-and-macos) +- [Pre-trained models](#pre-trained-models) +- [FAQ - frequently asked questions](https://github.com/AlexeyAB/darknet/wiki/FAQ---frequently-asked-questions) +- [Explanations in issues](https://github.com/AlexeyAB/darknet/issues?q=is%3Aopen+is%3Aissue+label%3AExplanations) +- [Yolo v4 in other frameworks (TensorRT, TensorFlow, PyTorch, OpenVINO, OpenCV-dnn, TVM,...)](#yolo-v4-in-other-frameworks) +- [Datasets](#datasets) + +- [Yolo v4, v3 and v2 for Windows and Linux](#yolo-v4-v3-and-v2-for-windows-and-linux) + - [(neural networks for object detection)](#neural-networks-for-object-detection) + - [GeForce RTX 2080 Ti](#geforce-rtx-2080-ti) + - [Youtube video of results](#youtube-video-of-results) + - [How to evaluate AP of YOLOv4 on the MS COCO evaluation server](#how-to-evaluate-ap-of-yolov4-on-the-ms-coco-evaluation-server) + - [How to evaluate FPS of YOLOv4 on GPU](#how-to-evaluate-fps-of-yolov4-on-gpu) + - [Pre-trained models](#pre-trained-models) + - [Requirements for Windows, Linux and macOS](#requirements-for-windows-linux-and-macos) + - [Yolo v4 in other frameworks](#yolo-v4-in-other-frameworks) + - [Datasets](#datasets) + - [Improvements in this repository](#improvements-in-this-repository) + - [How to use on the command line](#how-to-use-on-the-command-line) + - [For using network video-camera mjpeg-stream with any Android smartphone](#for-using-network-video-camera-mjpeg-stream-with-any-android-smartphone) + - [How to compile on Linux/macOS (using `CMake`)](#how-to-compile-on-linuxmacos-using-cmake) + - [Using also PowerShell](#using-also-powershell) + - [How to compile on Linux (using `make`)](#how-to-compile-on-linux-using-make) + - [How to compile on Windows (using `CMake`)](#how-to-compile-on-windows-using-cmake) + - [How to compile on Windows (using `vcpkg`)](#how-to-compile-on-windows-using-vcpkg) + - [How to train with multi-GPU](#how-to-train-with-multi-gpu) + - [How to train (to detect your custom objects)](#how-to-train-to-detect-your-custom-objects) + - [How to train tiny-yolo (to detect your custom objects)](#how-to-train-tiny-yolo-to-detect-your-custom-objects) + - [When should I stop training](#when-should-i-stop-training) + - [Custom object detection](#custom-object-detection) + - [How to improve object detection](#how-to-improve-object-detection) + - [How to mark bounded boxes of objects and create annotation files](#how-to-mark-bounded-boxes-of-objects-and-create-annotation-files) + - [How to use Yolo as DLL and SO libraries](#how-to-use-yolo-as-dll-and-so-libraries) + - [Citation](#citation) + +![Darknet Logo](http://pjreddie.com/media/files/darknet-black-small.png) + +![scaled_yolov4](https://user-images.githubusercontent.com/4096485/112776361-281d8380-9048-11eb-8083-8728b12dcd55.png) AP50:95 - FPS (Tesla V100) Paper: https://arxiv.org/abs/2011.08036 + +---- + +![modern_gpus](https://user-images.githubusercontent.com/4096485/82835867-f1c62380-9ecd-11ea-9134-1598ed2abc4b.png) AP50:95 / AP50 - FPS (Tesla V100) Paper: https://arxiv.org/abs/2004.10934 + +tkDNN-TensorRT accelerates YOLOv4 **~2x** times for batch=1 and **3x-4x** times for batch=4. + +- tkDNN: https://github.com/ceccocats/tkDNN +- OpenCV: https://gist.github.com/YashasSamaga/48bdb167303e10f4d07b754888ddbdcf + +### GeForce RTX 2080 Ti + +| Network Size | Darknet, FPS (avg) | tkDNN TensorRT FP32, FPS | tkDNN TensorRT FP16, FPS | OpenCV FP16, FPS | tkDNN TensorRT FP16 batch=4, FPS | OpenCV FP16 batch=4, FPS | tkDNN Speedup | +|:--------------------------:|:------------------:|-------------------------:|-------------------------:|-----------------:|---------------------------------:|-------------------------:|--------------:| +|320 | 100 | 116 | **202** | 183 | 423 | **430** | **4.3x** | +|416 | 82 | 103 | **162** | 159 | 284 | **294** | **3.6x** | +|512 | 69 | 91 | 134 | **138** | 206 | **216** | **3.1x** | +|608 | 53 | 62 | 103 | **115** | 150 | **150** | **2.8x** | +|Tiny 416 | 443 | 609 | **790** | 773 | **1774** | 1353 | **3.5x** | +|Tiny 416 CPU Core i7 7700HQ | 3.4 | - | - | 42 | - | 39 | **12x** | + +- Yolo v4 Full comparison: [map_fps](https://user-images.githubusercontent.com/4096485/80283279-0e303e00-871f-11ea-814c-870967d77fd1.png) +- Yolo v4 tiny comparison: [tiny_fps](https://user-images.githubusercontent.com/4096485/85734112-6e366700-b705-11ea-95d1-fcba0de76d72.png) +- CSPNet: [paper](https://arxiv.org/abs/1911.11929) and [map_fps](https://user-images.githubusercontent.com/4096485/71702416-6645dc00-2de0-11ea-8d65-de7d4b604021.png) comparison: https://github.com/WongKinYiu/CrossStagePartialNetworks +- Yolo v3 on MS COCO: [Speed / Accuracy (mAP@0.5) chart](https://user-images.githubusercontent.com/4096485/52151356-e5d4a380-2683-11e9-9d7d-ac7bc192c477.jpg) +- Yolo v3 on MS COCO (Yolo v3 vs RetinaNet) - Figure 3: https://arxiv.org/pdf/1804.02767v1.pdf +- Yolo v2 on Pascal VOC 2007: https://hsto.org/files/a24/21e/068/a2421e0689fb43f08584de9d44c2215f.jpg +- Yolo v2 on Pascal VOC 2012 (comp4): https://hsto.org/files/3a6/fdf/b53/3a6fdfb533f34cee9b52bdd9bb0b19d9.jpg + +#### Youtube video of results + +| [![Yolo v4](https://user-images.githubusercontent.com/4096485/101360000-1a33cf00-38ae-11eb-9e5e-b29c5fb0afbe.png)](https://youtu.be/1_SiUOYUoOI "Yolo v4") | [![Scaled Yolo v4](https://user-images.githubusercontent.com/4096485/101359389-43a02b00-38ad-11eb-866c-f813e96bf61a.png)](https://youtu.be/YDFf-TqJOFE "Scaled Yolo v4") | |---|---| -* Yolo v3 source chart for the RetinaNet on MS COCO got from Table 1 (e): https://arxiv.org/pdf/1708.02002.pdf -* Yolo v2 on Pascal VOC 2007: https://hsto.org/files/a24/21e/068/a2421e0689fb43f08584de9d44c2215f.jpg -* Yolo v2 on Pascal VOC 2012 (comp4): https://hsto.org/files/3a6/fdf/b53/3a6fdfb533f34cee9b52bdd9bb0b19d9.jpg +Others: https://www.youtube.com/user/pjreddie/videos +#### How to evaluate AP of YOLOv4 on the MS COCO evaluation server -# "You Only Look Once: Unified, Real-Time Object Detection (versions 2 & 3)" -A Yolo cross-platform Windows and Linux version (for object detection). Contributtors: https://github.com/pjreddie/darknet/graphs/contributors +1. Download and unzip test-dev2017 dataset from MS COCO server: http://images.cocodataset.org/zips/test2017.zip +2. Download list of images for Detection tasks and replace the paths with yours: https://raw.githubusercontent.com/AlexeyAB/darknet/master/scripts/testdev2017.txt +3. Download `yolov4.weights` file 245 MB: [yolov4.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights) (Google-drive mirror [yolov4.weights](https://drive.google.com/open?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT) ) +4. Content of the file `cfg/coco.data` should be -This repository is forked from Linux-version: https://github.com/pjreddie/darknet +```ini +classes= 80 +train = /trainvalno5k.txt +valid = /testdev2017.txt +names = data/coco.names +backup = backup +eval=coco +``` -More details: http://pjreddie.com/darknet/yolo/ +5. Create `/results/` folder near with `./darknet` executable file +6. Run validation: `./darknet detector valid cfg/coco.data cfg/yolov4.cfg yolov4.weights` +7. Rename the file `/results/coco_results.json` to `detections_test-dev2017_yolov4_results.json` and compress it to `detections_test-dev2017_yolov4_results.zip` +8. Submit file `detections_test-dev2017_yolov4_results.zip` to the MS COCO evaluation server for the `test-dev2019 (bbox)` -This repository supports: +#### How to evaluate FPS of YOLOv4 on GPU -* both Windows and Linux -* both OpenCV 2.x.x and OpenCV <= 3.4.0 (3.4.1 and higher isn't supported) -* both cuDNN v5-v7 -* CUDA >= 7.5 -* also create SO-library on Linux and DLL-library on Windows +1. Compile Darknet with `GPU=1 CUDNN=1 CUDNN_HALF=1 OPENCV=1` in the `Makefile` +2. Download `yolov4.weights` file 245 MB: [yolov4.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights) (Google-drive mirror [yolov4.weights](https://drive.google.com/open?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT) ) +3. Get any .avi/.mp4 video file (preferably not more than 1920x1080 to avoid bottlenecks in CPU performance) +4. Run one of two commands and look at the AVG FPS: -##### Requires: -* **Linux GCC>=4.9 or Windows MS Visual Studio 2015 (v140)**: https://go.microsoft.com/fwlink/?LinkId=532606&clcid=0x409 (or offline [ISO image](https://go.microsoft.com/fwlink/?LinkId=615448&clcid=0x409)) -* **CUDA 9.1**: https://developer.nvidia.com/cuda-downloads -* **OpenCV 3.4.0**: https://sourceforge.net/projects/opencvlibrary/files/opencv-win/3.4.0/opencv-3.4.0-vc14_vc15.exe/download -* **or OpenCV 2.4.13**: https://sourceforge.net/projects/opencvlibrary/files/opencv-win/2.4.13/opencv-2.4.13.2-vc14.exe/download - - OpenCV allows to show image or video detection in the window and store result to file that specified in command line `-out_filename res.avi` -* **GPU with CC >= 3.0**: https://en.wikipedia.org/wiki/CUDA#GPUs_supported +- include video_capturing + NMS + drawing_bboxes: + `./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -dont_show -ext_output` +- exclude video_capturing + NMS + drawing_bboxes: + `./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -benchmark` -##### Pre-trained models for different cfg-files can be downloaded from (smaller -> faster & lower quality): -* `yolov3.cfg` (236 MB COCO **Yolo v3**) - requires 4 GB GPU-RAM: https://pjreddie.com/media/files/yolov3.weights -* `yolov3-tiny.cfg` (34 MB COCO **Yolo v3 tiny**) - requires 1 GB GPU-RAM: https://pjreddie.com/media/files/yolov3-tiny.weights -* `yolov2.cfg` (194 MB COCO Yolo v2) - requires 4 GB GPU-RAM: https://pjreddie.com/media/files/yolov2.weights -* `yolo-voc.cfg` (194 MB VOC Yolo v2) - requires 4 GB GPU-RAM: http://pjreddie.com/media/files/yolo-voc.weights -* `yolov2-tiny.cfg` (43 MB COCO Yolo v2) - requires 1 GB GPU-RAM: https://pjreddie.com/media/files/yolov2-tiny.weights -* `yolov2-tiny-voc.cfg` (60 MB VOC Yolo v2) - requires 1 GB GPU-RAM: http://pjreddie.com/media/files/yolov2-tiny-voc.weights -* `yolo9000.cfg` (186 MB Yolo9000-model) - requires 4 GB GPU-RAM: http://pjreddie.com/media/files/yolo9000.weights +#### Pre-trained models -Put it near compiled: darknet.exe +There are weights-file for different cfg-files (trained for MS COCO dataset): -You can get cfg-files by path: `darknet/cfg/` +FPS on RTX 2070 (R) and Tesla V100 (V): -##### Examples of results: +- [yolov4-p6.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-p6.cfg) - 1280x1280 - **72.1% mAP@0.5 (54.0% AP@0.5:0.95) - 32(V) FPS** - xxx BFlops (xxx FMA) - 487 MB: [yolov4-p6.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-p6.weights) + - pre-trained weights for training: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-p6.conv.289 -[![Everything Is AWESOME](http://img.youtube.com/vi/VOC3huqHrss/0.jpg)](https://www.youtube.com/watch?v=VOC3huqHrss "Everything Is AWESOME") +- [yolov4-p5.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-p5.cfg) - 896x896 - **70.0% mAP@0.5 (51.6% AP@0.5:0.95) - 43(V) FPS** - xxx BFlops (xxx FMA) - 271 MB: [yolov4-p5.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-p5.weights) + - pre-trained weights for training: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-p5.conv.232 -Others: https://www.youtube.com/channel/UC7ev3hNVkx4DzZ3LO19oebg +- [yolov4-csp-x-swish.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-csp-x-swish.cfg) - 640x640 - **69.9% mAP@0.5 (51.5% AP@0.5:0.95) - 23(R) FPS / 50(V) FPS** - 221 BFlops (110 FMA) - 381 MB: [yolov4-csp-x-swish.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp-x-swish.weights) + - pre-trained weights for training: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp-x-swish.conv.192 -### How to use: +- [yolov4-csp-swish.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-csp-swish.cfg) - 640x640 - **68.7% mAP@0.5 (50.0% AP@0.5:0.95) - 70(V) FPS** - 120 (60 FMA) - 202 MB: [yolov4-csp-swish.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp-swish.weights) + - pre-trained weights for training: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp-swish.conv.164 -##### Example of usage in cmd-files from `build\darknet\x64\`: +- [yolov4x-mish.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4x-mish.cfg) - 640x640 - **68.5% mAP@0.5 (50.1% AP@0.5:0.95) - 23(R) FPS / 50(V) FPS** - 221 BFlops (110 FMA) - 381 MB: [yolov4x-mish.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.weights) + - pre-trained weights for training: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4x-mish.conv.166 -* `darknet_yolo_v3.cmd` - initialization with 236 MB **Yolo v3** COCO-model yolov3.weights & yolov3.cfg and show detection on the image: dog.jpg +- [yolov4-csp.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-csp.cfg) - 202 MB: [yolov4-csp.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp.weights) paper [Scaled Yolo v4](https://arxiv.org/abs/2011.08036) -* `darknet_voc.cmd` - initialization with 194 MB VOC-model yolo-voc.weights & yolo-voc.cfg and waiting for entering the name of the image file -* `darknet_demo_voc.cmd` - initialization with 194 MB VOC-model yolo-voc.weights & yolo-voc.cfg and play your video file which you must rename to: test.mp4 -* `darknet_demo_store.cmd` - initialization with 194 MB VOC-model yolo-voc.weights & yolo-voc.cfg and play your video file which you must rename to: test.mp4, and store result to: res.avi -* `darknet_net_cam_voc.cmd` - initialization with 194 MB VOC-model, play video from network video-camera mjpeg-stream (also from you phone) -* `darknet_web_cam_voc.cmd` - initialization with 194 MB VOC-model, play video from Web-Camera number #0 -* `darknet_coco_9000.cmd` - initialization with 186 MB Yolo9000 COCO-model, and show detection on the image: dog.jpg -* `darknet_coco_9000_demo.cmd` - initialization with 186 MB Yolo9000 COCO-model, and show detection on the video (if it is present): street4k.mp4, and store result to: res.avi + just change `width=` and `height=` parameters in `yolov4-csp.cfg` file and use the same `yolov4-csp.weights` file for all cases: + - `width=640 height=640` in cfg: **67.4% mAP@0.5 (48.7% AP@0.5:0.95) - 70(V) FPS** - 120 (60 FMA) BFlops + - `width=512 height=512` in cfg: **64.8% mAP@0.5 (46.2% AP@0.5:0.95) - 93(V) FPS** - 77 (39 FMA) BFlops + - pre-trained weights for training: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-csp.conv.142 -##### How to use on the command line: +- [yolov4.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg) - 245 MB: [yolov4.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights) (Google-drive mirror [yolov4.weights](https://drive.google.com/open?id=1cewMfusmPjYWbrnuJRuKhPMwRe_b9PaT) ) paper [Yolo v4](https://arxiv.org/abs/2004.10934) + just change `width=` and `height=` parameters in `yolov4.cfg` file and use the same `yolov4.weights` file for all cases: + - `width=608 height=608` in cfg: **65.7% mAP@0.5 (43.5% AP@0.5:0.95) - 34(R) FPS / 62(V) FPS** - 128.5 BFlops + - `width=512 height=512` in cfg: **64.9% mAP@0.5 (43.0% AP@0.5:0.95) - 45(R) FPS / 83(V) FPS** - 91.1 BFlops + - `width=416 height=416` in cfg: **62.8% mAP@0.5 (41.2% AP@0.5:0.95) - 55(R) FPS / 96(V) FPS** - 60.1 BFlops + - `width=320 height=320` in cfg: **60% mAP@0.5 ( 38% AP@0.5:0.95) - 63(R) FPS / 123(V) FPS** - 35.5 BFlops -On Linux use `./darknet` instead of `darknet.exe`, like this:`./darknet detector test ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights` +- [yolov4-tiny.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny.cfg) - **40.2% mAP@0.5 - 371(1080Ti) FPS / 330(RTX2070) FPS** - 6.9 BFlops - 23.1 MB: [yolov4-tiny.weights](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights) -* **Yolo v3** COCO - image: `darknet.exe detector test data/coco.data cfg/yolov3.cfg yolov3.weights -i 0 -thresh 0.25` -* Alternative method Yolo v3 COCO - image: `darknet.exe detect cfg/yolov3.cfg yolov3.weights -i 0 -thresh 0.25` -* Output coordinates of objects: `darknet.exe detector test data/coco.data yolov3.cfg yolov3.weights -thresh 0.25 dog.jpg -ext_output` -* 194 MB VOC-model - image: `darknet.exe detector test data/voc.data yolo-voc.cfg yolo-voc.weights -i 0` -* 194 MB VOC-model - video: `darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights test.mp4 -i 0` -* 194 MB VOC-model - **save result to the file res.avi**: `darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights test.mp4 -i 0 -out_filename res.avi` -* Alternative method 194 MB VOC-model - video: `darknet.exe yolo demo yolo-voc.cfg yolo-voc.weights test.mp4 -i 0` -* 43 MB VOC-model for video: `darknet.exe detector demo data/coco.data cfg/yolov2-tiny.cfg yolov2-tiny.weights test.mp4 -i 0` -* **Yolo v3** 236 MB COCO for net-videocam - Smart WebCam: `darknet.exe detector demo data/coco.data cfg/yolov3.cfg yolov3.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0` -* 194 MB VOC-model for net-videocam - Smart WebCam: `darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0` -* 194 MB VOC-model - WebCamera #0: `darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights -c 0` -* 186 MB Yolo9000 - image: `darknet.exe detector test cfg/combine9k.data yolo9000.cfg yolo9000.weights` -* Remeber to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app -* To process a list of images `data/train.txt` and save results of detection to `result.txt` use: - `darknet.exe detector test data/voc.data yolo-voc.cfg yolo-voc.weights -dont_show < data/train.txt > result.txt` - You can comment this line so that each image does not require pressing the button ESC: https://github.com/AlexeyAB/darknet/blob/6ccb41808caf753feea58ca9df79d6367dedc434/src/detector.c#L509 +- [enet-coco.cfg (EfficientNetB0-Yolov3)](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/enet-coco.cfg) - **45.5% mAP@0.5 - 55(R) FPS** - 3.7 BFlops - 18.3 MB: [enetb0-coco_final.weights](https://drive.google.com/file/d/1FlHeQjWEQVJt0ay1PVsiuuMzmtNyv36m/view) -##### For using network video-camera mjpeg-stream with any Android smartphone: +- [yolov3-openimages.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-openimages.cfg) - 247 MB - 18(R) FPS - OpenImages dataset: [yolov3-openimages.weights](https://pjreddie.com/media/files/yolov3-openimages.weights) -1. Download for Android phone mjpeg-stream soft: IP Webcam / Smart WebCam +
CLICK ME - Yolo v3 models +- [csresnext50-panet-spp-original-optimal.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/csresnext50-panet-spp-original-optimal.cfg) - **65.4% mAP@0.5 (43.2% AP@0.5:0.95) - 32(R) FPS** - 100.5 BFlops - 217 MB: [csresnext50-panet-spp-original-optimal_final.weights](https://drive.google.com/open?id=1_NnfVgj0EDtb_WLNoXV8Mo7WKgwdYZCc) - * Smart WebCam - preferably: https://play.google.com/store/apps/details?id=com.acontech.android.SmartWebCam2 - * IP Webcam: https://play.google.com/store/apps/details?id=com.pas.webcam +- [yolov3-spp.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-spp.cfg) - **60.6% mAP@0.5 - 38(R) FPS** - 141.5 BFlops - 240 MB: [yolov3-spp.weights](https://pjreddie.com/media/files/yolov3-spp.weights) -2. Connect your Android phone to computer by WiFi (through a WiFi-router) or USB -3. Start Smart WebCam on your phone -4. Replace the address below, on shown in the phone application (Smart WebCam) and launch: +- [csresnext50-panet-spp.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/csresnext50-panet-spp.cfg) - **60.0% mAP@0.5 - 44 FPS** - 71.3 BFlops - 217 MB: [csresnext50-panet-spp_final.weights](https://drive.google.com/file/d/1aNXdM8qVy11nqTcd2oaVB3mf7ckr258-/view?usp=sharing) +- [yolov3.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3.cfg) - **55.3% mAP@0.5 - 66(R) FPS** - 65.9 BFlops - 236 MB: [yolov3.weights](https://pjreddie.com/media/files/yolov3.weights) -* 194 MB COCO-model: `darknet.exe detector demo data/coco.data yolo.cfg yolo.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0` -* 194 MB VOC-model: `darknet.exe detector demo data/voc.data yolo-voc.cfg yolo-voc.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0` +- [yolov3-tiny.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny.cfg) - **33.1% mAP@0.5 - 345(R) FPS** - 5.6 BFlops - 33.7 MB: [yolov3-tiny.weights](https://pjreddie.com/media/files/yolov3-tiny.weights) -### How to compile on Linux: +- [yolov3-tiny-prn.cfg](https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3-tiny-prn.cfg) - **33.1% mAP@0.5 - 370(R) FPS** - 3.5 BFlops - 18.8 MB: [yolov3-tiny-prn.weights](https://drive.google.com/file/d/18yYZWyKbo4XSDVyztmsEcF9B_6bxrhUY/view?usp=sharing) -Just do `make` in the darknet directory. -Before make, you can set such options in the `Makefile`: [link](https://github.com/AlexeyAB/darknet/blob/9c1b9a2cf6363546c152251be578a21f3c3caec6/Makefile#L1) -* `GPU=1` to build with CUDA to accelerate by using GPU (CUDA should be in `/usr/local/cuda`) -* `CUDNN=1` to build with cuDNN v5-v7 to accelerate training by using GPU (cuDNN should be in `/usr/local/cudnn`) -* `CUDNN_HALF=1` to build for Tensor Cores (on Titan V / Tesla V100 / DGX-2 and later) speedup Detection 3x, Training 2x -* `OPENCV=1` to build with OpenCV 3.x/2.4.x - allows to detect on video files and video streams from network cameras or web-cams -* `DEBUG=1` to bould debug version of Yolo -* `OPENMP=1` to build with OpenMP support to accelerate Yolo by using multi-core CPU -* `LIBSO=1` to build a library `darknet.so` and binary runable file `uselib` that uses this library. Or you can try to run so `LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib test.mp4` How to use this SO-library from your own code - you can look at C++ example: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp +
+
CLICK ME - Yolo v2 models -### How to compile on Windows: +- `yolov2.cfg` (194 MB COCO Yolo v2) - requires 4 GB GPU-RAM: https://pjreddie.com/media/files/yolov2.weights +- `yolo-voc.cfg` (194 MB VOC Yolo v2) - requires 4 GB GPU-RAM: http://pjreddie.com/media/files/yolo-voc.weights +- `yolov2-tiny.cfg` (43 MB COCO Yolo v2) - requires 1 GB GPU-RAM: https://pjreddie.com/media/files/yolov2-tiny.weights +- `yolov2-tiny-voc.cfg` (60 MB VOC Yolo v2) - requires 1 GB GPU-RAM: http://pjreddie.com/media/files/yolov2-tiny-voc.weights +- `yolo9000.cfg` (186 MB Yolo9000-model) - requires 4 GB GPU-RAM: http://pjreddie.com/media/files/yolo9000.weights -1. If you have **MSVS 2015, CUDA 9.1, cuDNN 7.0 and OpenCV 3.x** (with paths: `C:\opencv_3.0\opencv\build\include` & `C:\opencv_3.0\opencv\build\x64\vc14\lib`), then start MSVS, open `build\darknet\darknet.sln`, set **x64** and **Release**, and do the: Build -> Build darknet. **NOTE:** If installing OpenCV, use OpenCV 3.4.0 or earlier. This is a bug in OpenCV 3.4.1 in the C API (see [#500](https://github.com/AlexeyAB/darknet/issues/500)). +
- 1.1. Find files `opencv_world320.dll` and `opencv_ffmpeg320_64.dll` (or `opencv_world340.dll` and `opencv_ffmpeg340_64.dll`) in `C:\opencv_3.0\opencv\build\x64\vc14\bin` and put it near with `darknet.exe` - - 1.2 Check that there are `bin` and `include` folders in the `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1` if aren't, then copy them to this folder from the path where is CUDA installed - - 1.3. To install CUDNN (speedup neural network), do the following: - - * download and install **cuDNN 7.0 for CUDA 9.1**: https://developer.nvidia.com/cudnn - - * add Windows system variable `cudnn` with path to CUDNN: https://hsto.org/files/a49/3dc/fc4/a493dcfc4bd34a1295fd15e0e2e01f26.jpg - - 1.4. If you want to build **without CUDNN** then: open `\darknet.sln` -> (right click on project) -> properties -> C/C++ -> Preprocessor -> Preprocessor Definitions, and remove this: `CUDNN;` +Put it near compiled: darknet.exe -2. If you have other version of **CUDA (not 9.1)** then open `build\darknet\darknet.vcxproj` by using Notepad, find 2 places with "CUDA 9.1" and change it to your CUDA-version, then do step 1 +You can get cfg-files by path: `darknet/cfg/` -3. If you **don't have GPU**, but have **MSVS 2015 and OpenCV 3.0** (with paths: `C:\opencv_3.0\opencv\build\include` & `C:\opencv_3.0\opencv\build\x64\vc14\lib`), then start MSVS, open `build\darknet\darknet_no_gpu.sln`, set **x64** and **Release**, and do the: Build -> Build darknet_no_gpu +### Requirements for Windows, Linux and macOS + +- **CMake >= 3.18**: https://cmake.org/download/ +- **Powershell** (already installed on windows): https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell +- **CUDA >= 10.2**: https://developer.nvidia.com/cuda-toolkit-archive (on Linux do [Post-installation Actions](https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#post-installation-actions)) +- **OpenCV >= 2.4**: use your preferred package manager (brew, apt), build from source using [vcpkg](https://github.com/Microsoft/vcpkg) or download from [OpenCV official site](https://opencv.org/releases.html) (on Windows set system variable `OpenCV_DIR` = `C:\opencv\build` - where are the `include` and `x64` folders [image](https://user-images.githubusercontent.com/4096485/53249516-5130f480-36c9-11e9-8238-a6e82e48c6f2.png)) +- **cuDNN >= 8.0.2** https://developer.nvidia.com/rdp/cudnn-archive (on **Linux** follow steps described here https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installlinux-tar , on **Windows** follow steps described here https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#installwindows) +- **GPU with CC >= 3.0**: https://en.wikipedia.org/wiki/CUDA#GPUs_supported + +### Yolo v4 in other frameworks + +- **Pytorch - Scaled-YOLOv4:** https://github.com/WongKinYiu/ScaledYOLOv4 +- **TensorFlow:** `pip install yolov4` YOLOv4 on TensorFlow 2.0 / TFlite / Android: https://github.com/hunglc007/tensorflow-yolov4-tflite + Official TF models: https://github.com/tensorflow/models/tree/master/official/vision/beta/projects/yolo + For YOLOv4 - convert `yolov4.weights`/`cfg` files to `yolov4.pb` by using [TNTWEN](https://github.com/TNTWEN/OpenVINO-YOLOV4) project, and to `yolov4.tflite` [TensorFlow-lite](https://www.tensorflow.org/lite/guide/get_started#2_convert_the_model_format) +- **OpenCV** the fastest implementation of YOLOv4 for CPU (x86/ARM-Android), OpenCV can be compiled with [OpenVINO-backend](https://github.com/opencv/opencv/wiki/Intel's-Deep-Learning-Inference-Engine-backend) for running on (Myriad X / USB Neural Compute Stick / Arria FPGA), use `yolov4.weights`/`cfg` with: [C++ example](https://github.com/opencv/opencv/blob/8c25a8eb7b10fb50cda323ee6bec68aa1a9ce43c/samples/dnn/object_detection.cpp#L192-L221) or [Python example](https://github.com/opencv/opencv/blob/8c25a8eb7b10fb50cda323ee6bec68aa1a9ce43c/samples/dnn/object_detection.py#L129-L150) +- **Intel OpenVINO 2021.2:** supports YOLOv4 (NPU Myriad X / USB Neural Compute Stick / Arria FPGA): https://devmesh.intel.com/projects/openvino-yolov4-49c756 read this [manual](https://github.com/TNTWEN/OpenVINO-YOLOV4) (old [manual](https://software.intel.com/en-us/articles/OpenVINO-Using-TensorFlow#converting-a-darknet-yolo-model) ) (for [Scaled-YOLOv4](https://github.com/WongKinYiu/ScaledYOLOv4/tree/yolov4-large) models use https://github.com/Chen-MingChang/pytorch_YOLO_OpenVINO_demo ) +- **PyTorch > ONNX**: + - [WongKinYiu/PyTorch_YOLOv4](https://github.com/WongKinYiu/PyTorch_YOLOv4) + - [maudzung/3D-YOLOv4](https://github.com/maudzung/Complex-YOLOv4-Pytorch) + - [Tianxiaomo/pytorch-YOLOv4](https://github.com/Tianxiaomo/pytorch-YOLOv4) + - [YOLOv5](https://github.com/ultralytics/yolov5) +- **ONNX** on Jetson for YOLOv4: https://developer.nvidia.com/blog/announcing-onnx-runtime-for-jetson/ and https://github.com/ttanzhiqiang/onnx_tensorrt_project +- **nVidia Transfer Learning Toolkit (TLT>=3.0)** Training and Detection https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/yolo_v4.html +- **TensorRT+tkDNN**: https://github.com/ceccocats/tkDNN#fps-results +- **Deepstream 5.0 / TensorRT for YOLOv4** https://github.com/NVIDIA-AI-IOT/yolov4_deepstream or https://github.com/marcoslucianops/DeepStream-Yolo read [Yolo is natively supported in DeepStream 4.0](https://news.developer.nvidia.com/deepstream-sdk-4-now-available/) and [PDF](https://docs.nvidia.com/metropolis/deepstream/Custom_YOLO_Model_in_the_DeepStream_YOLO_App.pdf). Additionally [jkjung-avt/tensorrt_demos](https://github.com/jkjung-avt/tensorrt_demos) or [wang-xinyu/tensorrtx](https://github.com/wang-xinyu/tensorrtx) +- **Triton Inference Server / TensorRT** https://github.com/isarsoft/yolov4-triton-tensorrt +- **DirectML** https://github.com/microsoft/DirectML/tree/master/Samples/yolov4 +- **OpenCL** (Intel, AMD, Mali GPUs for macOS & GNU/Linux) https://github.com/sowson/darknet +- **HIP** for Training and Detection on AMD GPU https://github.com/os-hackathon/darknet +- **ROS** (Robot Operating System) https://github.com/engcang/ros-yolo-sort +- **Xilinx Zynq Ultrascale+ Deep Learning Processor (DPU) ZCU102/ZCU104:** https://github.com/Xilinx/Vitis-In-Depth-Tutorial/tree/master/Machine_Learning/Design_Tutorials/07-yolov4-tutorial +- **Amazon Neurochip / Amazon EC2 Inf1 instances** 1.85 times higher throughput and 37% lower cost per image for TensorFlow based YOLOv4 model, using Keras [URL](https://aws.amazon.com/ru/blogs/machine-learning/improving-performance-for-deep-learning-based-object-detection-with-an-aws-neuron-compiled-yolov4-model-on-aws-inferentia/) +- **TVM** - compilation of deep learning models (Keras, MXNet, PyTorch, Tensorflow, CoreML, DarkNet) into minimum deployable modules on diverse hardware backend (CPUs, GPUs, FPGA, and specialized accelerators): https://tvm.ai/about +- **Tencent/ncnn:** the fastest inference of YOLOv4 on mobile phone CPU: https://github.com/Tencent/ncnn +- **OpenDataCam** - It detects, tracks and counts moving objects by using YOLOv4: https://github.com/opendatacam/opendatacam#-hardware-pre-requisite +- **Netron** - Visualizer for neural networks: https://github.com/lutzroeder/netron + +#### Datasets + +- MS COCO: use `./scripts/get_coco_dataset.sh` to get labeled MS COCO detection dataset +- OpenImages: use `python ./scripts/get_openimages_dataset.py` for labeling train detection dataset +- Pascal VOC: use `python ./scripts/voc_label.py` for labeling Train/Test/Val detection datasets +- ILSVRC2012 (ImageNet classification): use `./scripts/get_imagenet_train.sh` (also `imagenet_label.sh` for labeling valid set) +- German/Belgium/Russian/LISA/MASTIF Traffic Sign Datasets for Detection - use this parser: https://github.com/angeligareta/Datasets2Darknet#detection-task +- List of other datasets: https://github.com/AlexeyAB/darknet/tree/master/scripts#datasets + +### Improvements in this repository + +- developed State-of-the-Art object detector YOLOv4 +- added State-of-Art models: CSP, PRN, EfficientNet +- added layers: [conv_lstm], [scale_channels] SE/ASFF/BiFPN, [local_avgpool], [sam], [Gaussian_yolo], [reorg3d] (fixed [reorg]), fixed [batchnorm] +- added the ability for training recurrent models (with layers conv-lstm`[conv_lstm]`/conv-rnn`[crnn]`) for accurate detection on video +- added data augmentation: `[net] mixup=1 cutmix=1 mosaic=1 blur=1`. Added activations: SWISH, MISH, NORM_CHAN, NORM_CHAN_SOFTMAX +- added the ability for training with GPU-processing using CPU-RAM to increase the mini_batch_size and increase accuracy (instead of batch-norm sync) +- improved binary neural network performance **2x-4x times** for Detection on CPU and GPU if you trained your own weights by using this XNOR-net model (bit-1 inference) : https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov3-tiny_xnor.cfg +- improved neural network performance **~7%** by fusing 2 layers into 1: Convolutional + Batch-norm +- improved performance: Detection **2x times**, on GPU Volta/Turing (Tesla V100, GeForce RTX, ...) using Tensor Cores if `CUDNN_HALF` defined in the `Makefile` or `darknet.sln` +- improved performance **~1.2x** times on FullHD, **~2x** times on 4K, for detection on the video (file/stream) using `darknet detector demo`... +- improved performance **3.5 X times** of data augmentation for training (using OpenCV SSE/AVX functions instead of hand-written functions) - removes bottleneck for training on multi-GPU or GPU Volta +- improved performance of detection and training on Intel CPU with AVX (Yolo v3 **~85%**) +- optimized memory allocation during network resizing when `random=1` +- optimized GPU initialization for detection - we use batch=1 initially instead of re-init with batch=1 +- added correct calculation of **mAP, F1, IoU, Precision-Recall** using command `darknet detector map`... +- added drawing of chart of average-Loss and accuracy-mAP (`-map` flag) during training +- run `./darknet detector demo ... -json_port 8070 -mjpeg_port 8090` as JSON and MJPEG server to get results online over the network by using your soft or Web-browser +- added calculation of anchors for training +- added example of Detection and Tracking objects: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp +- run-time tips and warnings if you use incorrect cfg-file or dataset +- added support for Windows +- many other fixes of code... + +And added manual - [How to train Yolo v4-v2 (to detect your custom objects)](#how-to-train-to-detect-your-custom-objects) + +Also, you might be interested in using a simplified repository where is implemented INT8-quantization (+30% speedup and -1% mAP reduced): https://github.com/AlexeyAB/yolo2_light + +#### How to use on the command line + +If you use `build.ps1` script or the makefile (Linux only) you will find `darknet` in the root directory. + +If you use the deprecated Visual Studio solutions, you will find `darknet` in the directory `\build\darknet\x64`. + +If you customize build with CMake GUI, darknet executable will be installed in your preferred folder. + +- Yolo v4 COCO - **image**: `./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights -thresh 0.25` +- **Output coordinates** of objects: `./darknet detector test cfg/coco.data yolov4.cfg yolov4.weights -ext_output dog.jpg` +- Yolo v4 COCO - **video**: `./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights -ext_output test.mp4` +- Yolo v4 COCO - **WebCam 0**: `./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights -c 0` +- Yolo v4 COCO for **net-videocam** - Smart WebCam: `./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights http://192.168.0.80:8080/video?dummy=param.mjpg` +- Yolo v4 - **save result videofile res.avi**: `./darknet detector demo cfg/coco.data cfg/yolov4.cfg yolov4.weights test.mp4 -out_filename res.avi` +- Yolo v3 **Tiny** COCO - video: `./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights test.mp4` +- **JSON and MJPEG server** that allows multiple connections from your soft or Web-browser `ip-address:8070` and 8090: `./darknet detector demo ./cfg/coco.data ./cfg/yolov3.cfg ./yolov3.weights test50.mp4 -json_port 8070 -mjpeg_port 8090 -ext_output` +- Yolo v3 Tiny **on GPU #1**: `./darknet detector demo cfg/coco.data cfg/yolov3-tiny.cfg yolov3-tiny.weights -i 1 test.mp4` +- Alternative method Yolo v3 COCO - image: `./darknet detect cfg/yolov4.cfg yolov4.weights -i 0 -thresh 0.25` +- Train on **Amazon EC2**, to see mAP & Loss-chart using URL like: `http://ec2-35-160-228-91.us-west-2.compute.amazonaws.com:8090` in the Chrome/Firefox (**Darknet should be compiled with OpenCV**): + `./darknet detector train cfg/coco.data yolov4.cfg yolov4.conv.137 -dont_show -mjpeg_port 8090 -map` +- 186 MB Yolo9000 - image: `./darknet detector test cfg/combine9k.data cfg/yolo9000.cfg yolo9000.weights` +- Remember to put data/9k.tree and data/coco9k.map under the same folder of your app if you use the cpp api to build an app +- To process a list of images `data/train.txt` and save results of detection to `result.json` file use: + `./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights -ext_output -dont_show -out result.json < data/train.txt` +- To process a list of images `data/train.txt` and save results of detection to `result.txt` use: + `./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights -dont_show -ext_output < data/train.txt > result.txt` +- To process a video and output results to a json file use: `darknet.exe detector demo cfg/coco.data cfg/yolov3.cfg yolov3.weights file.mp4 -dont_show -json_file_output results.json` +- Pseudo-labelling - to process a list of images `data/new_train.txt` and save results of detection in Yolo training format for each image as label `.txt` (in this way you can increase the amount of training data) use: + `./darknet detector test cfg/coco.data cfg/yolov4.cfg yolov4.weights -thresh 0.25 -dont_show -save_labels < data/new_train.txt` +- To calculate anchors: `./darknet detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416` +- To check accuracy mAP@IoU=50: `./darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights` +- To check accuracy mAP@IoU=75: `./darknet detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights -iou_thresh 0.75` + +##### For using network video-camera mjpeg-stream with any Android smartphone -4. If you have **OpenCV 2.4.13** instead of 3.0 then you should change pathes after `\darknet.sln` is opened +1. Download for Android phone mjpeg-stream soft: IP Webcam / Smart WebCam - 4.1 (right click on project) -> properties -> C/C++ -> General -> Additional Include Directories: `C:\opencv_2.4.13\opencv\build\include` - - 4.2 (right click on project) -> properties -> Linker -> General -> Additional Library Directories: `C:\opencv_2.4.13\opencv\build\x64\vc14\lib` - -5. If you have GPU with Tensor Cores (nVidia Titan V / Tesla V100 / DGX-2 and later) speedup Detection 3x, Training 2x: - `\darknet.sln` -> (right click on project) -> properties -> C/C++ -> Preprocessor -> Preprocessor Definitions, and add here: `CUDNN_HALF;` + - Smart WebCam - preferably: https://play.google.com/store/apps/details?id=com.acontech.android.SmartWebCam2 + - IP Webcam: https://play.google.com/store/apps/details?id=com.pas.webcam -### How to compile (custom): +2. Connect your Android phone to the computer by WiFi (through a WiFi-router) or USB +3. Start Smart WebCam on your phone +4. Replace the address below, shown in the phone application (Smart WebCam) and launch: -Also, you can to create your own `darknet.sln` & `darknet.vcxproj`, this example for CUDA 9.1 and OpenCV 3.0 +- Yolo v4 COCO-model: `./darknet detector demo data/coco.data yolov4.cfg yolov4.weights http://192.168.0.80:8080/video?dummy=param.mjpg -i 0` -Then add to your created project: -- (right click on project) -> properties -> C/C++ -> General -> Additional Include Directories, put here: +### How to compile on Linux/macOS (using `CMake`) -`C:\opencv_3.0\opencv\build\include;..\..\3rdparty\include;%(AdditionalIncludeDirectories);$(CudaToolkitIncludeDir);$(cudnn)\include` -- (right click on project) -> Build dependecies -> Build Customizations -> set check on CUDA 9.1 or what version you have - for example as here: http://devblogs.nvidia.com/parallelforall/wp-content/uploads/2015/01/VS2013-R-5.jpg -- add to project all `.c` & `.cu` files and file `http_stream.cpp` from `\src` -- (right click on project) -> properties -> Linker -> General -> Additional Library Directories, put here: +The `CMakeLists.txt` will attempt to find installed optional dependencies like CUDA, cudnn, ZED and build against those. It will also create a shared object library file to use `darknet` for code development. -`C:\opencv_3.0\opencv\build\x64\vc14\lib;$(CUDA_PATH)lib\$(PlatformName);$(cudnn)\lib\x64;%(AdditionalLibraryDirectories)` -- (right click on project) -> properties -> Linker -> Input -> Additional dependecies, put here: +To update CMake on Ubuntu, it's better to follow guide here: https://apt.kitware.com/ or https://cmake.org/download/ -`..\..\3rdparty\lib\x64\pthreadVC2.lib;cublas.lib;curand.lib;cudart.lib;cudnn.lib;%(AdditionalDependencies)` -- (right click on project) -> properties -> C/C++ -> Preprocessor -> Preprocessor Definitions +```bash +git clone https://github.com/AlexeyAB/darknet --recurse-submodules +cd darknet +mkdir build_release +cd build_release +cmake .. +cmake --build . --target install --parallel 8 +``` -`OPENCV;_TIMESPEC_DEFINED;_CRT_SECURE_NO_WARNINGS;_CRT_RAND_S;WIN32;NDEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions)` +### Using also PowerShell -- compile to .exe (X64 & Release) and put .dll-s near with .exe: +Install: `Cmake`, `CUDA`, `cuDNN` [How to install dependencies](#requirements) - * `pthreadVC2.dll, pthreadGC2.dll` from \3rdparty\dll\x64 +Install powershell for your OS (Linux or MacOS) ([guide here](https://docs.microsoft.com/en-us/powershell/scripting/install/installing-powershell)). - * `cusolver64_91.dll, curand64_91.dll, cudart64_91.dll, cublas64_91.dll` - 91 for CUDA 9.1 or your version, from C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.1\bin +Open PowerShell type these commands - * For OpenCV 3.2: `opencv_world320.dll` and `opencv_ffmpeg320_64.dll` from `C:\opencv_3.0\opencv\build\x64\vc14\bin` - * For OpenCV 2.4.13: `opencv_core2413.dll`, `opencv_highgui2413.dll` and `opencv_ffmpeg2413_64.dll` from `C:\opencv_2.4.13\opencv\build\x64\vc14\bin` +```PowerShell +git clone https://github.com/AlexeyAB/darknet --recurse-submodules +cd darknet +./CCM/build.ps1 -UseVCPKG -EnableOPENCV -EnableCUDA -EnableCUDNN +``` -## How to train (Pascal VOC Data): +- remove options like `-EnableCUDA` or `-EnableCUDNN` if you are not interested into +- remove option `-UseVCPKG` if you plan to manually provide OpenCV library to darknet or if you do not want to enable OpenCV integration +- add option `-EnableOPENCV_CUDA` if you want to build OpenCV with CUDA support - very slow to build! (requires `-UseVCPKG`) -1. Download pre-trained weights for the convolutional layers (154 MB): http://pjreddie.com/media/files/darknet53.conv.74 and put to the directory `build\darknet\x64` +If you open the `build.ps1` script at the beginning you will find all available switches. -2. Download The Pascal VOC Data and unpack it to directory `build\darknet\x64\data\voc` will be created dir `build\darknet\x64\data\voc\VOCdevkit\`: - * http://pjreddie.com/media/files/VOCtrainval_11-May-2012.tar - * http://pjreddie.com/media/files/VOCtrainval_06-Nov-2007.tar - * http://pjreddie.com/media/files/VOCtest_06-Nov-2007.tar - - 2.1 Download file `voc_label.py` to dir `build\darknet\x64\data\voc`: http://pjreddie.com/media/files/voc_label.py +### How to compile on Linux (using `make`) -3. Download and install Python for Windows: https://www.python.org/ftp/python/3.5.2/python-3.5.2-amd64.exe +Just do `make` in the darknet directory. (You can try to compile and run it on Google Colab in cloud [link](https://colab.research.google.com/drive/12QusaaRj_lUwCGDvQNfICpa7kA7_a2dE) (press «Open in Playground» button at the top-left corner) and watch the video [link](https://www.youtube.com/watch?v=mKAEGSxwOAY) ) +Before make, you can set such options in the `Makefile`: [link](https://github.com/AlexeyAB/darknet/blob/9c1b9a2cf6363546c152251be578a21f3c3caec6/Makefile#L1) -4. Run command: `python build\darknet\x64\data\voc\voc_label.py` (to generate files: 2007_test.txt, 2007_train.txt, 2007_val.txt, 2012_train.txt, 2012_val.txt) +- `GPU=1` to build with CUDA to accelerate by using GPU (CUDA should be in `/usr/local/cuda`) +- `CUDNN=1` to build with cuDNN v5-v7 to accelerate training by using GPU (cuDNN should be in `/usr/local/cudnn`) +- `CUDNN_HALF=1` to build for Tensor Cores (on Titan V / Tesla V100 / DGX-2 and later) speedup Detection 3x, Training 2x +- `OPENCV=1` to build with OpenCV 4.x/3.x/2.4.x - allows to detect on video files and video streams from network cameras or web-cams +- `DEBUG=1` to build debug version of Yolo +- `OPENMP=1` to build with OpenMP support to accelerate Yolo by using multi-core CPU +- `LIBSO=1` to build a library `darknet.so` and binary runnable file `uselib` that uses this library. Or you can try to run so `LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib test.mp4` How to use this SO-library from your own code - you can look at C++ example: https://github.com/AlexeyAB/darknet/blob/master/src/yolo_console_dll.cpp + or use in such a way: `LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov4.cfg yolov4.weights test.mp4` +- `ZED_CAMERA=1` to build a library with ZED-3D-camera support (should be ZED SDK installed), then run + `LD_LIBRARY_PATH=./:$LD_LIBRARY_PATH ./uselib data/coco.names cfg/yolov4.cfg yolov4.weights zed_camera` +- You also need to specify for which graphics card the code is generated. This is done by setting `ARCH=`. If you use a newer version than CUDA 11 you further need to edit line 20 from Makefile and remove `-gencode arch=compute_30,code=sm_30 \` as Kepler GPU support was dropped in CUDA 11. You can also drop the general `ARCH=` and just uncomment `ARCH=` for your graphics card. -5. Run command: `type 2007_train.txt 2007_val.txt 2012_*.txt > train.txt` +### How to compile on Windows (using `CMake`) -6. Set `batch=64` and `subdivisions=8` in the file `yolov3-voc.cfg`: [link](https://github.com/AlexeyAB/darknet/blob/ee38c6e1513fb089b35be4ffa692afd9b3f65747/cfg/yolov3-voc.cfg#L3-L4) +Requires: -7. Start training by using `train_voc.cmd` or by using the command line: +- MSVC: https://visualstudio.microsoft.com/thank-you-downloading-visual-studio/?sku=Community +- CMake GUI: `Windows win64-x64 Installer`https://cmake.org/download/ +- Download Darknet zip-archive with the latest commit and uncompress it: [master.zip](https://github.com/AlexeyAB/darknet/archive/master.zip) - `darknet.exe detector train data/voc.data cfg/yolov3-voc.cfg darknet53.conv.74` +In Windows: -(**Note:** To disable Loss-Window use flag `-dont_show`. If you are using CPU, try `darknet_no_gpu.exe` instead of `darknet.exe`.) +- Start (button) -> All programs -> CMake -> CMake (gui) -> -If required change pathes in the file `build\darknet\x64\data\voc.data` +- [look at image](https://habrastorage.org/webt/pz/s1/uu/pzs1uu4heb7vflfcjqn-lxy-aqu.jpeg) In CMake: Enter input path to the darknet Source, and output path to the Binaries -> Configure (button) -> Optional platform for generator: `x64` -> Finish -> Generate -> Open Project -> -More information about training by the link: http://pjreddie.com/darknet/yolo/#train-voc +- in MS Visual Studio: Select: x64 and Release -> Build -> Build solution - **Note:** If during training you see `nan` values in some lines then training goes well, but if `nan` are in all lines then training goes wrong. +- find the executable file `darknet.exe` in the output path to the binaries you specified -## How to train with multi-GPU: +![x64 and Release](https://habrastorage.org/webt/ay/ty/f-/aytyf-8bufe7q-16yoecommlwys.jpeg) -1. Train it first on 1 GPU for like 1000 iterations: `darknet.exe detector train data/voc.data cfg/yolov3-voc.cfg darknet53.conv.74` +### How to compile on Windows (using `vcpkg`) -2. Then stop and by using partially-trained model `/backup/yolov3-voc_1000.weights` run training with multigpu (up to 4 GPUs): `darknet.exe detector train data/voc.data cfg/yolov3-voc.cfg /backup/yolov3-voc_1000.weights -gpus 0,1,2,3` +This is the recommended approach to build Darknet on Windows. -https://groups.google.com/d/msg/darknet/NbJqonJBTSY/Te5PfIpuCAAJ +1. Install Visual Studio 2017 or 2019. In case you need to download it, please go here: [Visual Studio Community](http://visualstudio.com). Remember to install English language pack, this is mandatory for vcpkg! -## How to train (to detect your custom objects): -(to train old Yolo v2 `yolov2-voc.cfg`, `yolov2-tiny-voc.cfg`, `yolo-voc.cfg`, `yolo-voc.2.0.cfg`, ... [click by the link](https://github.com/AlexeyAB/darknet/tree/47c7af1cea5bbdedf1184963355e6418cb8b1b4f#how-to-train-pascal-voc-data)) +2. Install CUDA enabling VS Integration during installation. -Training Yolo v3: +3. Open Powershell (Start -> All programs -> Windows Powershell) and type these commands: -1. Create file `yolo-obj.cfg` with the same content as in `yolov3.cfg` (or copy `yolov3.cfg` to `yolo-obj.cfg)` and: +```PowerShell +Set-ExecutionPolicy unrestricted -Scope CurrentUser -Force +git clone https://github.com/AlexeyAB/darknet --recurse-submodules +cd darknet +.\CCM\build.ps1 -UseVCPKG -EnableOPENCV -EnableCUDA -EnableCUDNN +``` - * change line batch to [`batch=64`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L3) - * change line subdivisions to [`subdivisions=8`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L4) - * change line `classes=80` to your number of objects in each of 3 `[yolo]`-layers: - * https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L610 - * https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L696 - * https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L783 - * change [`filters=255`] to filters=(classes + 5)x3 in the 3 `[convolutional]` before each `[yolo]` layer - * https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L603 - * https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L689 - * https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L776 +(add option `-EnableOPENCV_CUDA` if you want to build OpenCV with CUDA support - very slow to build! - or remove options like `-EnableCUDA` or `-EnableCUDNN` if you are not interested in them). If you open the `build.ps1` script at the beginning you will find all available switches. - So if `classes=1` then should be `filters=18`. If `classes=2` then write `filters=21`. - - **(Do not write in the cfg-file: filters=(classes + 5)x3)** - - (Generally `filters` depends on the `classes`, `coords` and number of `mask`s, i.e. filters=`(classes + coords + 1)*`, where `mask` is indices of anchors. If `mask` is absence, then filters=`(classes + coords + 1)*num`) +## How to train with multi-GPU - So for example, for 2 objects, your file `yolo-obj.cfg` should differ from `yolov3.cfg` in such lines in each of **3** [yolo]-layers: +1. Train it first on 1 GPU for like 1000 iterations: `darknet.exe detector train cfg/coco.data cfg/yolov4.cfg yolov4.conv.137` - ``` - [convolutional] - filters=21 +2. Then stop and by using partially-trained model `/backup/yolov4_1000.weights` run training with multigpu (up to 4 GPUs): `darknet.exe detector train cfg/coco.data cfg/yolov4.cfg /backup/yolov4_1000.weights -gpus 0,1,2,3` - [region] - classes=2 - ``` +If you get a Nan, then for some datasets better to decrease learning rate, for 4 GPUs set `learning_rate = 0,00065` (i.e. learning_rate = 0.00261 / GPUs). In this case also increase 4x times `burn_in =` in your cfg-file. I.e. use `burn_in = 4000` instead of `1000`. -2. Create file `obj.names` in the directory `build\darknet\x64\data\`, with objects names - each in new line +https://groups.google.com/d/msg/darknet/NbJqonJBTSY/Te5PfIpuCAAJ + +## How to train (to detect your custom objects) + +(to train old Yolo v2 `yolov2-voc.cfg`, `yolov2-tiny-voc.cfg`, `yolo-voc.cfg`, `yolo-voc.2.0.cfg`, ... [click by the link](https://github.com/AlexeyAB/darknet/tree/47c7af1cea5bbdedf1184963355e6418cb8b1b4f#how-to-train-pascal-voc-data)) +Training Yolo v4 (and v3): + +0. For training `cfg/yolov4-custom.cfg` download the pre-trained weights-file (162 MB): [yolov4.conv.137](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137) (Google drive mirror [yolov4.conv.137](https://drive.google.com/open?id=1JKF-bdIklxOOVy-2Cr5qdvjgGpmGfcbp) ) +1. Create file `yolo-obj.cfg` with the same content as in `yolov4-custom.cfg` (or copy `yolov4-custom.cfg` to `yolo-obj.cfg)` and: + +- change line batch to [`batch=64`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L3) +- change line subdivisions to [`subdivisions=16`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L4) +- change line max_batches to (`classes*2000`, but not less than number of training images and not less than `6000`), f.e. [`max_batches=6000`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L20) if you train for 3 classes +- change line steps to 80% and 90% of max_batches, f.e. [`steps=4800,5400`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L22) +- set network size `width=416 height=416` or any value multiple of 32: https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L8-L9 +- change line `classes=80` to your number of objects in each of 3 `[yolo]`-layers: + - https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L610 + - https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L696 + - https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L783 +- change [`filters=255`] to filters=(classes + 5)x3 in the 3 `[convolutional]` before each `[yolo]` layer, keep in mind that it only has to be the last `[convolutional]` before each of the `[yolo]` layers. + - https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L603 + - https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L689 + - https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L776 +- when using [`[Gaussian_yolo]`](https://github.com/AlexeyAB/darknet/blob/6e5bdf1282ad6b06ed0e962c3f5be67cf63d96dc/cfg/Gaussian_yolov3_BDD.cfg#L608) layers, change [`filters=57`] filters=(classes + 9)x3 in the 3 `[convolutional]` before each `[Gaussian_yolo]` layer + - https://github.com/AlexeyAB/darknet/blob/6e5bdf1282ad6b06ed0e962c3f5be67cf63d96dc/cfg/Gaussian_yolov3_BDD.cfg#L604 + - https://github.com/AlexeyAB/darknet/blob/6e5bdf1282ad6b06ed0e962c3f5be67cf63d96dc/cfg/Gaussian_yolov3_BDD.cfg#L696 + - https://github.com/AlexeyAB/darknet/blob/6e5bdf1282ad6b06ed0e962c3f5be67cf63d96dc/cfg/Gaussian_yolov3_BDD.cfg#L789 + +So if `classes=1` then should be `filters=18`. If `classes=2` then write `filters=21`. +**(Do not write in the cfg-file: filters=(classes + 5)x3)** + +(Generally `filters` depends on the `classes`, `coords` and number of `mask`s, i.e. filters=`(classes + coords + 1)*`, where `mask` is indices of anchors. If `mask` is absence, then filters=`(classes + coords + 1)*num`) + +So for example, for 2 objects, your file `yolo-obj.cfg` should differ from `yolov4-custom.cfg` in such lines in each of **3** [yolo]-layers: + +```ini +[convolutional] +filters=21 + +[region] +classes=2 +``` + +2. Create file `obj.names` in the directory `build\darknet\x64\data\`, with objects names - each in new line 3. Create file `obj.data` in the directory `build\darknet\x64\data\`, containing (where **classes = number of objects**): - ``` - classes= 2 + ```ini + classes = 2 train = data/train.txt valid = data/test.txt names = data/obj.names @@ -275,20 +534,22 @@ Training Yolo v3: ``` 4. Put image-files (.jpg) of your objects in the directory `build\darknet\x64\data\obj\` - 5. You should label each object on images from your dataset. Use this visual GUI-software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: https://github.com/AlexeyAB/Yolo_mark -It will create `.txt`-file for each `.jpg`-image-file - in the same directory and with the same name, but with `.txt`-extension, and put to file: object number and object coordinates on this image, for each object in new line: ` ` +It will create `.txt`-file for each `.jpg`-image-file - in the same directory and with the same name, but with `.txt`-extension, and put to file: object number and object coordinates on this image, for each object in new line: + +` ` + + Where: - Where: - * `` - integer number of object from `0` to `(classes-1)` - * ` ` - float values relative to width and height of image, it can be equal from (0.0 to 1.0] - * for example: ` = / ` or ` = / ` - * atention: ` ` - are center of rectangle (are not top-left corner) +- `` - integer object number from `0` to `(classes-1)` +- ` ` - float values **relative** to width and height of image, it can be equal from `(0.0 to 1.0]` +- for example: ` = / ` or ` = / ` +- attention: ` ` - are center of rectangle (are not top-left corner) For example for `img1.jpg` you will be created `img1.txt` containing: - ``` + ```csv 1 0.716797 0.395833 0.216406 0.147222 0 0.687109 0.379167 0.255469 0.158333 1 0.420312 0.395833 0.140625 0.166667 @@ -296,62 +557,85 @@ It will create `.txt`-file for each `.jpg`-image-file - in the same directory an 6. Create file `train.txt` in directory `build\darknet\x64\data\`, with filenames of your images, each filename in new line, with path relative to `darknet.exe`, for example containing: - ``` + ```csv data/obj/img1.jpg data/obj/img2.jpg data/obj/img3.jpg ``` -7. Download pre-trained weights for the convolutional layers (154 MB): https://pjreddie.com/media/files/darknet53.conv.74 and put to the directory `build\darknet\x64` +7. Download pre-trained weights for the convolutional layers and put to the directory `build\darknet\x64` + - for `yolov4.cfg`, `yolov4-custom.cfg` (162 MB): [yolov4.conv.137](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137) (Google drive mirror [yolov4.conv.137](https://drive.google.com/open?id=1JKF-bdIklxOOVy-2Cr5qdvjgGpmGfcbp) ) + - for `yolov4-tiny.cfg`, `yolov4-tiny-3l.cfg`, `yolov4-tiny-custom.cfg` (19 MB): [yolov4-tiny.conv.29](https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29) + - for `csresnext50-panet-spp.cfg` (133 MB): [csresnext50-panet-spp.conv.112](https://drive.google.com/file/d/16yMYCLQTY_oDlCIZPfn_sab6KD3zgzGq/view?usp=sharing) + - for `yolov3.cfg, yolov3-spp.cfg` (154 MB): [darknet53.conv.74](https://pjreddie.com/media/files/darknet53.conv.74) + - for `yolov3-tiny-prn.cfg , yolov3-tiny.cfg` (6 MB): [yolov3-tiny.conv.11](https://drive.google.com/file/d/18v36esoXCh-PsOKwyP2GWrpYDptDY8Zf/view?usp=sharing) + - for `enet-coco.cfg (EfficientNetB0-Yolov3)` (14 MB): [enetb0-coco.conv.132](https://drive.google.com/file/d/1uhh3D6RSn0ekgmsaTcl-ZW53WBaUDo6j/view?usp=sharing) + +8. Start training by using the command line: `darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137` + + To train on Linux use command: `./darknet detector train data/obj.data yolo-obj.cfg yolov4.conv.137` (just use `./darknet` instead of `darknet.exe`) -8. Start training by using the command line: `darknet.exe detector train data/obj.data yolo-obj.cfg darknet53.conv.74` + - (file `yolo-obj_last.weights` will be saved to the `build\darknet\x64\backup\` for each 100 iterations) + - (file `yolo-obj_xxxx.weights` will be saved to the `build\darknet\x64\backup\` for each 1000 iterations) + - (to disable Loss-Window use `darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -dont_show`, if you train on computer without monitor like a cloud Amazon EC2) + - (to see the mAP & Loss-chart during training on remote server without GUI, use command `darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -dont_show -mjpeg_port 8090 -map` then open URL `http://ip-address:8090` in Chrome/Firefox browser) - (file `yolo-obj_xxx.weights` will be saved to the `build\darknet\x64\backup\` for each 100 iterations) - (To disable Loss-Window use `darknet.exe detector train data/obj.data yolo-obj.cfg darknet53.conv.74 -dont_show`, if you train on computer without monitor like a cloud Amazaon EC2) +8.1. For training with mAP (mean average precisions) calculation for each 4 Epochs (set `valid=valid.txt` or `train.txt` in `obj.data` file) and run: `darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map` + +8.2. One can also set the `-mAP_epochs` in the training command if less or more frequent mAP calculation is needed. For example in order to calculate mAP for each 2 Epochs run `darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map -mAP_epochs 2` 9. After training is complete - get result `yolo-obj_final.weights` from path `build\darknet\x64\backup\` - * After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just copy `yolo-obj_2000.weights` from `build\darknet\x64\backup\` to `build\darknet\x64\` and start training using: `darknet.exe detector train data/obj.data yolo-obj.cfg yolo-obj_2000.weights` + - After each 100 iterations you can stop and later start training from this point. For example, after 2000 iterations you can stop training, and later just start training using: `darknet.exe detector train data/obj.data yolo-obj.cfg backup\yolo-obj_2000.weights` (in the original repository https://github.com/pjreddie/darknet the weights-file is saved only once every 10 000 iterations `if(iterations > 1000)`) - * Also you can get result earlier than all 45000 iterations. - - **Note:** If during training you see `nan` values in some lines then training goes well, but if `nan` are in all lines then training goes wrong. - -### How to train tiny-yolo (to detect your custom objects): + - Also you can get result earlier than all 45000 iterations. + + **Note:** If during training you see `nan` values for `avg` (loss) field - then training goes wrong, but if `nan` is in some other lines - then training goes well. + + **Note:** If you changed width= or height= in your cfg-file, then new width and height must be divisible by 32. + + **Note:** After training use such command for detection: `darknet.exe detector test data/obj.data yolo-obj.cfg yolo-obj_8000.weights` + + **Note:** if error `Out of memory` occurs then in `.cfg`-file you should increase `subdivisions=16`, 32 or 64: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L4) + +### How to train tiny-yolo (to detect your custom objects) Do all the same steps as for the full yolo model as described above. With the exception of: -* Download default weights file for yolov2-tiny-voc: http://pjreddie.com/media/files/yolov2-tiny-voc.weights -* Get pre-trained weights yolov2-tiny-voc.conv.13 using command: `darknet.exe partial cfg/yolov2-tiny-voc.cfg yolov2-tiny-voc.weights yolov2-tiny-voc.conv.13 13` -* Make your custom model `yolov2-tiny-obj.cfg` based on `cfg/yolov2-tiny-voc.cfg` instead of `yolov3.cfg` -* Start training: `darknet.exe detector train data/obj.data yolov2-tiny-obj.cfg yolov2-tiny-voc.conv.13` + +- Download file with the first 29-convolutional layers of yolov4-tiny: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29 + (Or get this file from yolov4-tiny.weights file by using command: `darknet.exe partial cfg/yolov4-tiny-custom.cfg yolov4-tiny.weights yolov4-tiny.conv.29 29` +- Make your custom model `yolov4-tiny-obj.cfg` based on `cfg/yolov4-tiny-custom.cfg` instead of `yolov4.cfg` +- Start training: `darknet.exe detector train data/obj.data yolov4-tiny-obj.cfg yolov4-tiny.conv.29` For training Yolo based on other models ([DenseNet201-Yolo](https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/densenet201_yolo.cfg) or [ResNet50-Yolo](https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/resnet50_yolo.cfg)), you can download and get pre-trained weights as showed in this file: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/partial.cmd If you made you custom model that isn't based on other models, then you can train it without pre-trained weights, then will be used random initial weights. - -## When should I stop training: -Usually sufficient 2000 iterations for each class(object). But for a more precise definition when you should stop training, use the following manual: +## When should I stop training + +Usually sufficient 2000 iterations for each class(object), but not less than number of training images and not less than 6000 iterations in total. But for a more precise definition of when you should stop training, use the following manual: 1. During training, you will see varying indicators of error, and you should stop when no longer decreases **0.XXXXXXX avg**: > Region Avg IOU: 0.798363, Class: 0.893232, Obj: 0.700808, No Obj: 0.004567, Avg Recall: 1.000000, count: 8 > Region Avg IOU: 0.800677, Class: 0.892181, Obj: 0.701590, No Obj: 0.004574, Avg Recall: 1.000000, count: 8 > - > **9002**: 0.211667, **0.060730 avg**, 0.001000 rate, 3.868000 seconds, 576128 images + > **9002**: 0.211667, **0.60730 avg**, 0.001000 rate, 3.868000 seconds, 576128 images > Loaded: 0.000000 seconds - * **9002** - iteration number (number of batch) - * **0.060730 avg** - average loss (error) - **the lower, the better** +- **9002** - iteration number (number of batch) +- **0.60730 avg** - average loss (error) - **the lower, the better** - When you see that average loss **0.xxxxxx avg** no longer decreases at many iterations then you should stop training. + When you see that average loss **0.xxxxxx avg** no longer decreases at many iterations then you should stop training. The final average loss can be from `0.05` (for a small model and easy dataset) to `3.0` (for a big model and a difficult dataset). + + Or if you train with flag `-map` then you will see mAP indicator `Last accuracy mAP@0.5 = 18.50%` in the console - this indicator is better than Loss, so train while mAP increases. 2. Once training is stopped, you should take some of last `.weights`-files from `darknet\build\darknet\x64\backup` and choose the best of them: -For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to overfitting. **Overfitting** - is case when you can detect objects on images from training-dataset, but can't detect ojbects on any others images. You should get weights from **Early Stopping Point**: +For example, you stopped training after 9000 iterations, but the best result can give one of previous weights (7000, 8000, 9000). It can happen due to over-fitting. **Over-fitting** - is case when you can detect objects on images from training-dataset, but can't detect objects on any others images. You should get weights from **Early Stopping Point**: -![Overfitting](https://hsto.org/files/5dc/7ae/7fa/5dc7ae7fad9d4e3eb3a484c58bfc1ff5.png) +![Over-fitting](https://hsto.org/files/5dc/7ae/7fa/5dc7ae7fad9d4e3eb3a484c58bfc1ff5.png) To get weights from Early Stopping Point: @@ -361,132 +645,206 @@ To get weights from Early Stopping Point: (If you use another GitHub repository, then use `darknet.exe detector recall`... instead of `darknet.exe detector map`...) -* `darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights` -* `darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_8000.weights` -* `darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_9000.weights` +- `darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_7000.weights` +- `darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_8000.weights` +- `darknet.exe detector map data/obj.data yolo-obj.cfg backup\yolo-obj_9000.weights` + +And compare last output lines for each weights (7000, 8000, 9000): -And comapre last output lines for each weights (7000, 8000, 9000): +Choose weights-file **with the highest mAP (mean average precision)** or IoU (intersect over union) -Choose weights-file **with the highest IoU** (intersect of union) and mAP (mean average precision) +For example, **bigger mAP** gives weights `yolo-obj_8000.weights` - then **use this weights for detection**. -For example, **bigger IOU** gives weights `yolo-obj_8000.weights` - then **use this weights for detection**. +Or just train with `-map` flag: + +`darknet.exe detector train data/obj.data yolo-obj.cfg yolov4.conv.137 -map` + +So you will see mAP-chart (red-line) in the Loss-chart Window. mAP will be calculated for each 4 Epochs using `valid=valid.txt` file that is specified in `obj.data` file (`1 Epoch = images_in_train_txt / batch` iterations) + +(to change the max x-axis value - change [`max_batches=`](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L20) parameter to `2000*classes`, f.e. `max_batches=6000` for 3 classes) + +![loss_chart_map_chart](https://hsto.org/webt/yd/vl/ag/ydvlagutof2zcnjodstgroen8ac.jpeg) Example of custom object detection: `darknet.exe detector test data/obj.data yolo-obj.cfg yolo-obj_8000.weights` -* **IoU** (intersect of union) - average instersect of union of objects and detections for a certain threshold = 0.24 +- **IoU** (intersect over union) - average intersect over union of objects and detections for a certain threshold = 0.24 -* **mAP** (mean average precision) - mean value of `average precisions` for each class, where `average precision` is average value of 11 points on PR-curve for each possible threshold (each probability of detection) for the same class (Precision-Recall in terms of PascalVOC, where Precision=TP/(TP+FP) and Recall=TP/(TP+FN) ), page-11: http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf +- **mAP** (mean average precision) - mean value of `average precisions` for each class, where `average precision` is average value of 11 points on PR-curve for each possible threshold (each probability of detection) for the same class (Precision-Recall in terms of PascalVOC, where Precision=TP/(TP+FP) and Recall=TP/(TP+FN) ), page-11: http://homepages.inf.ed.ac.uk/ckiw/postscript/ijcv_voc09.pdf **mAP** is default metric of precision in the PascalVOC competition, **this is the same as AP50** metric in the MS COCO competition. In terms of Wiki, indicators Precision and Recall have a slightly different meaning than in the PascalVOC competition, but **IoU always has the same meaning**. ![precision_recall_iou](https://hsto.org/files/ca8/866/d76/ca8866d76fb840228940dbf442a7f06a.jpg) -### How to calculate mAP on PascalVOC 2007: - -1. To calculate mAP (mean average precision) on PascalVOC-2007-test: -* Download PascalVOC dataset, install Python 3.x and get file `2007_test.txt` as described here: https://github.com/AlexeyAB/darknet#how-to-train-pascal-voc-data -* Then download file https://raw.githubusercontent.com/AlexeyAB/darknet/master/scripts/voc_label_difficult.py to the dir `build\darknet\x64\data\` then run `voc_label_difficult.py` to get the file `difficult_2007_test.txt` -* Remove symbol `#` from this line to un-comment it: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/data/voc.data#L4 -* Then there are 2 ways to get mAP: - 1. Using Darknet + Python: run the file `build/darknet/x64/calc_mAP_voc_py.cmd` - you will get mAP for `yolo-voc.cfg` model, mAP = 75.9% - 2. Using this fork of Darknet: run the file `build/darknet/x64/calc_mAP.cmd` - you will get mAP for `yolo-voc.cfg` model, mAP = 75.8% - - (The article specifies the value of mAP = 76.8% for YOLOv2 416×416, page-4 table-3: https://arxiv.org/pdf/1612.08242v1.pdf. We get values lower - perhaps due to the fact that the model was trained on a slightly different source code than the code on which the detection is was done) - -* if you want to get mAP for `tiny-yolo-voc.cfg` model, then un-comment line for tiny-yolo-voc.cfg and comment line for yolo-voc.cfg in the .cmd-file -* if you have Python 2.x instead of Python 3.x, and if you use Darknet+Python-way to get mAP, then in your cmd-file use `reval_voc.py` and `voc_eval.py` instead of `reval_voc_py3.py` and `voc_eval_py3.py` from this directory: https://github.com/AlexeyAB/darknet/tree/master/scripts - -### Custom object detection: +### Custom object detection Example of custom object detection: `darknet.exe detector test data/obj.data yolo-obj.cfg yolo-obj_8000.weights` | ![Yolo_v2_training](https://hsto.org/files/d12/1e7/515/d121e7515f6a4eb694913f10de5f2b61.jpg) | ![Yolo_v2_training](https://hsto.org/files/727/c7e/5e9/727c7e5e99bf4d4aa34027bb6a5e4bab.jpg) | |---|---| -## How to improve object detection: +## How to improve object detection 1. Before training: - * set flag `random=1` in your `.cfg`-file - it will increase precision by training Yolo for different resolutions: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L788) - * increase network resolution in your `.cfg`-file (`height=608`, `width=608` or any value multiple of 32) - it will increase precision +- set flag `random=1` in your `.cfg`-file - it will increase precision by training Yolo for different resolutions: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L788) - * recalculate anchors for your dataset for `width` and `height` from cfg-file: - `darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416` - then set the same 9 `anchors` in each of 3 `[yolo]`-layers in your cfg-file +- increase network resolution in your `.cfg`-file (`height=608`, `width=608` or any value multiple of 32) - it will increase precision - * desirable that your training dataset include images with objects at diffrent: scales, rotations, lightings, from different sides, on different backgrounds +- check that each object that you want to detect is mandatory labeled in your dataset - no one object in your data set should not be without label. In the most training issues - there are wrong labels in your dataset (got labels by using some conversion script, marked with a third-party tool, ...). Always check your dataset by using: https://github.com/AlexeyAB/Yolo_mark - * desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty `.txt` files) +- my Loss is very high and mAP is very low, is training wrong? Run training with `-show_imgs` flag at the end of training command, do you see correct bounded boxes of objects (in windows or in files `aug_...jpg`)? If no - your training dataset is wrong. - * for training with a large number of objects in each image, add the parameter `max=200` or higher value in the last layer [region] in your cfg-file - - * to speedup training (with decreasing detection accuracy) do Fine-Tuning instead of Transfer-Learning, set param `stopbackward=1` in one of the penultimate convolutional layers before the 1-st `[yolo]`-layer, for example here: https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L598 +- for each object which you want to detect - there must be at least 1 similar object in the Training dataset with about the same: shape, side of object, relative size, angle of rotation, tilt, illumination. So desirable that your training dataset include images with objects at different: scales, rotations, lightings, from different sides, on different backgrounds - you should preferably have 2000 different images for each class or more, and you should train `2000*classes` iterations or more -2. After training - for detection: +- desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty `.txt` files) - use as many images of negative samples as there are images with objects - * Increase network-resolution by set in your `.cfg`-file (`height=608` and `width=608`) or (`height=832` and `width=832`) or (any value multiple of 32) - this increases the precision and makes it possible to detect small objects: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L8-L9) +- What is the best way to mark objects: label only the visible part of the object, or label the visible and overlapped part of the object, or label a little more than the entire object (with a little gap)? Mark as you like - how would you like it to be detected. + +- for training with a large number of objects in each image, add the parameter `max=200` or higher value in the last `[yolo]`-layer or `[region]`-layer in your cfg-file (the global maximum number of objects that can be detected by YoloV3 is `0,0615234375*(width*height)` where are width and height are parameters from `[net]` section in cfg-file) + +- for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set `layers = 23` instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L895 + - set `stride=4` instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L892 + - set `stride=4` instead of https://github.com/AlexeyAB/darknet/blob/6f718c257815a984253346bba8fb7aa756c55090/cfg/yolov4.cfg#L989 - * you do not need to train the network again, just use `.weights`-file already trained for 416x416 resolution - * if error `Out of memory` occurs then in `.cfg`-file you should increase `subdivisions=16`, 32 or 64: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L4) +- for training for both small and large objects use modified models: + - Full-model: 5 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov3_5l.cfg + - Tiny-model: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-tiny-3l.cfg + - YOLOv4: 3 yolo layers: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4-custom.cfg + +- If you train the model to distinguish Left and Right objects as separate classes (left/right hand, left/right-turn on road signs, ...) then for disabling flip data augmentation - add `flip=0` here: https://github.com/AlexeyAB/darknet/blob/3d2d0a7c98dbc8923d9ff705b81ff4f7940ea6ff/cfg/yolov3.cfg#L17 + +- General rule - your training dataset should include such a set of relative sizes of objects that you want to detect: + - `train_network_width * train_obj_width / train_image_width ~= detection_network_width * detection_obj_width / detection_image_width` + - `train_network_height * train_obj_height / train_image_height ~= detection_network_height * detection_obj_height / detection_image_height` + + I.e. for each object from Test dataset there must be at least 1 object in the Training dataset with the same class_id and about the same relative size: -## How to mark bounded boxes of objects and create annotation files: + `object width in percent from Training dataset` ~= `object width in percent from Test dataset` + + That is, if only objects that occupied 80-90% of the image were present in the training set, then the trained network will not be able to detect objects that occupy 1-10% of the image. + +- to speedup training (with decreasing detection accuracy) set param `stopbackward=1` for layer-136 in cfg-file + +- each: `model of object, side, illumination, scale, each 30 grad` of the turn and inclination angles - these are *different objects* from an internal perspective of the neural network. So the more *different objects* you want to detect, the more complex network model should be used. + +- to make the detected bounded boxes more accurate, you can add 3 parameters `ignore_thresh = .9 iou_normalizer=0.5 iou_loss=giou` to each `[yolo]` layer and train, it will increase mAP@0.9, but decrease mAP@0.5. + +- Only if you are an **expert** in neural detection networks - recalculate anchors for your dataset for `width` and `height` from cfg-file: +`darknet.exe detector calc_anchors data/obj.data -num_of_clusters 9 -width 416 -height 416` +then set the same 9 `anchors` in each of 3 `[yolo]`-layers in your cfg-file. But you should change indexes of anchors `masks=` for each [yolo]-layer, so for YOLOv4 the 1st-[yolo]-layer has anchors smaller than 30x30, 2nd smaller than 60x60, 3rd remaining, and vice versa for YOLOv3. Also you should change the `filters=(classes + 5)*` before each [yolo]-layer. If many of the calculated anchors do not fit under the appropriate layers - then just try using all the default anchors. + +2. After training - for detection: -Here you can find repository with GUI-software for marking bounded boxes of objects and generating annotation files for Yolo v2 & v3: https://github.com/AlexeyAB/Yolo_mark +- Increase network-resolution by set in your `.cfg`-file (`height=608` and `width=608`) or (`height=832` and `width=832`) or (any value multiple of 32) - this increases the precision and makes it possible to detect small objects: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L8-L9) -With example of: `train.txt`, `obj.names`, `obj.data`, `yolo-obj.cfg`, `air`1-6`.txt`, `bird`1-4`.txt` for 2 classes of objects (air, bird) and `train_obj.cmd` with example how to train this image-set with Yolo v2 & v3 +- it is not necessary to train the network again, just use `.weights`-file already trained for 416x416 resolution -## Using Yolo9000 +- to get even greater accuracy you should train with higher resolution 608x608 or 832x832, note: if error `Out of memory` occurs then in `.cfg`-file you should increase `subdivisions=16`, 32 or 64: [link](https://github.com/AlexeyAB/darknet/blob/0039fd26786ab5f71d5af725fc18b3f521e7acfd/cfg/yolov3.cfg#L4) - Simultaneous detection and classification of 9000 objects: +## How to mark bounded boxes of objects and create annotation files -* `yolo9000.weights` - (186 MB Yolo9000 Model) requires 4 GB GPU-RAM: http://pjreddie.com/media/files/yolo9000.weights +Here you can find repository with GUI-software for marking bounded boxes of objects and generating annotation files for Yolo v2 - v4: https://github.com/AlexeyAB/Yolo_mark -* `yolo9000.cfg` - cfg-file of the Yolo9000, also there are paths to the `9k.tree` and `coco9k.map` https://github.com/AlexeyAB/darknet/blob/617cf313ccb1fe005db3f7d88dec04a04bd97cc2/cfg/yolo9000.cfg#L217-L218 +With example of: `train.txt`, `obj.names`, `obj.data`, `yolo-obj.cfg`, `air`1-6`.txt`, `bird`1-4`.txt` for 2 classes of objects (air, bird) and `train_obj.cmd` with example how to train this image-set with Yolo v2 - v4 - * `9k.tree` - **WordTree** of 9418 categories - `