Skip to content

Controlling Git Repository Behavior with .gitattributes

A .gitattributes file lets you assign attributes to file paths in a Git repository, giving you explicit control over how Git handles those files during checkouts, merges, diffs, and archives. Line ending inconsistencies, corrupted binary diffs, merge conflicts in generated files, and bloated archive packages are all problems that a well-constructed .gitattributes file prevents before they reach your team.

What Is a .gitattributes File?

A .gitattributes file is a plain-text configuration file that maps path patterns to one or more attributes. Git reads it during operations that touch the working tree or the object database, and it uses the attributes to decide how to process each matched file. The same repository can carry multiple .gitattributes files at different directory levels, and a global file can apply across all repositories on a machine.

Locations

Git checks for .gitattributes files in four places, from highest to lowest priority:

Location Scope Notes
.gitattributes (repository root or any subdirectory) Repository Committed to the repo; applies to all contributors
.git/info/attributes Repository Local only; never committed
core.attributesFile (global config) User Applies to all repositories on the machine
$(prefix)/etc/gitattributes (system config) System Applies to all users on the machine

Rules in a subdirectory .gitattributes file apply only to paths within that subdirectory and its descendants. Rules in the repository root apply to everything unless overridden by a more specific file lower in the tree.

Commit the file at the repository root

The .gitattributes file at the repository root is the most important one for team consistency. Because it is committed to the repository, every contributor who clones or fetches gets the same behavior automatically, without any local configuration step.

Syntax

Each non-blank, non-comment line in a .gitattributes file has the form:

<pattern>  <attr> [<attr> ...]

The pattern follows the same glob rules used in .gitignore: * matches any sequence of characters within a path component, ** matches across path separators, and a leading / anchors the pattern to the directory containing the .gitattributes file. The last matching pattern wins for each attribute, so more specific rules placed later in the file take precedence over broader ones.

An attribute can be set in one of four states:

Form Meaning
attr Set (true)
-attr Unset (false)
attr=value Set to a specific value
!attr Unspecified (use the built-in default)

Line Ending Normalization

Line ending inconsistency is one of the most common sources of noise in cross-platform repositories. Windows tools default to CRLF (\r\n); macOS and Linux use LF (\n). When contributors on different platforms commit files without normalization, a single changed line can produce a diff that touches every line in the file. On top of the diff noise, CRLF in shell scripts, makefiles, and configuration files frequently causes runtime failures on Linux-based CI runners and containers.

The text attribute and its related settings give you explicit control over how Git stores and checks out line endings.

The text Attribute

Setting text on a path tells Git to treat the file as text and normalize line endings when committing and checking out. The exact behavior depends on the eol sub-attribute and the core.eol and core.autocrlf git config values.

Attribute Effect
text Normalize on commit and convert on checkout based on platform and config
text=auto Let Git decide whether the file is text; normalize if it detects text
-text Disable all line ending conversion for this path
!text Restore the built-in default for this path
binary Shorthand for -text -diff

The most practical approach for most repositories is a two-line setup in .gitattributes:

# Normalize all text files to LF in the repository
*           text=auto
*.sh        text eol=lf
*.bash      text eol=lf
*.py        text eol=lf
*.go        text eol=lf
*.rb        text eol=lf
*.yaml      text eol=lf
*.yml       text eol=lf
*.json      text eol=lf
*.md        text eol=lf
*.txt       text eol=lf
*.html      text eol=lf
*.css       text eol=lf
*.js        text eol=lf
*.ts        text eol=lf

# Files that must use CRLF on checkout (Windows-specific)
*.bat       text eol=crlf
*.cmd       text eol=crlf
*.ps1       text eol=crlf

The text=auto fallback on * handles any file type not listed explicitly. Git inspects the content and normalizes line endings only for files it detects as text. Binary files are left untouched.

The eol=lf overrides on source files ensure that on checkout, regardless of core.eol or core.autocrlf settings on the local machine, the working-tree copy always has LF line endings. This is critical for scripts and files that run inside Linux containers or CI runners.

Renormalizing an Existing Repository

If you add .gitattributes to a repository that already has inconsistent line endings, the existing committed files are not automatically corrected. To renormalize, run:

git add --renormalize .
git commit -S -m "chore: normalize line endings with .gitattributes"

git add --renormalize re-stages every tracked file through the .gitattributes rules, replacing whatever is currently in the index with the normalized version. The resulting commit corrects the history for all future clones without rewriting any existing commits.

Coordinate the renormalization commit with your team

A renormalization commit touches every text file in the repository. Any in-flight branches will have merge conflicts against it. Announce the change, merge it when the branch is quiet, and ask contributors to rebase their branches against it before continuing.

Binary File Classification

Git uses a heuristic to decide whether a file is binary or text: it looks at the first few kilobytes of content for null bytes and high-ratio non-printable characters. That heuristic is usually right, but it fails for some formats, and the failures cause real problems:

  • Treating a compiled binary as text produces garbled diff output.
  • Treating a binary file as text can corrupt it during a merge or checkout on Windows if core.autocrlf is active.
  • Some text-like binary formats (such as SQLite databases) look like text to the heuristic and get incorrectly normalized.

The binary attribute is shorthand for -text -diff. Setting it on a path tells Git to skip line ending conversion and skip attempting to generate a textual diff for that file:

# Images
*.png       binary
*.jpg       binary
*.jpeg      binary
*.gif       binary
*.ico       binary
*.svg       binary
*.webp      binary

# Compiled objects and archives
*.a         binary
*.o         binary
*.so        binary
*.dylib     binary
*.dll       binary
*.exe       binary
*.zip       binary
*.tar       binary
*.gz        binary
*.tgz       binary
*.7z        binary

# Office documents
*.pdf       binary
*.doc       binary
*.docx      binary
*.xls       binary
*.xlsx      binary
*.ppt       binary
*.pptx      binary

# SQLite databases
*.db        binary
*.sqlite    binary
*.sqlite3   binary

# Fonts
*.woff      binary
*.woff2     binary
*.ttf       binary
*.otf       binary
*.eot       binary

Annotating binary files explicitly removes any ambiguity and makes the repository behave correctly regardless of the platform or Git client in use.

Merge Strategies

The merge attribute controls which merge strategy Git uses when a three-way merge encounters conflicts in a matched file. The default strategy tries to resolve conflicts line by line and marks unresolvable sections with conflict markers. For certain file types, this default produces worse results than an alternative strategy.

merge=union

The union strategy resolves conflicts by including all changed lines from both sides, rather than marking the conflict and stopping. This is useful for files where every line is an independent record and the order does not matter much:

# Append-only files where both sides' additions should be preserved
*.po        merge=union

Note

The union strategy can produce logically incorrect results if lines from both sides depend on each other. Use it only for files where the content is genuinely additive and order-independent.

merge=ours

Registering a driver named ours is not a built-in; you configure it in git config and then reference it by name in .gitattributes. The common use case is generated files that should never be merged: the repository version always wins and the incoming version is discarded.

Configure the driver once:

git config --global merge.ours.driver true

Then reference it in .gitattributes:

# Never merge generated changelogs; keep the current branch's version
CHANGELOG.md    merge=ours

Custom Merge Drivers

For files with structure that Git's default merge does not understand well, you can register a custom driver. A custom driver is an external program that receives the three input files (ancestor, ours, theirs) and writes the merged result.

Register the driver in ~/.gitconfig or .git/config:

[merge "npm-lockfile"]
    name = npm lock file merge driver
    driver = npx npm-merge-driver merge %O %A %B %P
    recursive = binary

Then assign it in .gitattributes:

package-lock.json   merge=npm-lockfile

This approach is most useful for lock files (npm, yarn, Cargo, poetry) where naive three-way merges produce files that satisfy the merge tool but fail validation at install time.

Diff Attributes

The diff attribute controls how Git generates output for git diff, git log -p, and patch generation. By default, diff is enabled for all text files and disabled for binary files. You can override this behavior in either direction and attach custom diff drivers for specific formats.

Disabling Diffs

Setting -diff suppresses textual diff output for a path. This is already implied by binary, but you can apply it independently to a text file when the raw diff output is too noisy to be useful:

# Disable textual diffs for minified assets
*.min.js    -diff
*.min.css   -diff

Custom Diff Drivers

A custom diff driver is an external command Git calls to produce the diff output for a matched file. Custom drivers are most useful for binary formats that have a readable textual representation. For example, you can configure a driver that extracts readable text from a PDF or Office document for diff purposes:

*.pdf       diff=pdf

Register the driver in ~/.gitconfig:

[diff "pdf"]
    textconv = pdftotext -layout

When you run git diff or git log -p on a .pdf file, Git calls pdftotext -layout on each version and diffs the text output. The binary file itself is never modified; the conversion is only for display.

A similar pattern works for SQLite databases, compiled protobuf files, and other formats where an external tool can produce a meaningful textual representation:

[diff "sqlite3"]
    binary = true
    textconv = echo .dump | sqlite3
*.db        diff=sqlite3
*.sqlite    diff=sqlite3

Export Attributes

The export-ignore and export-subst attributes control the output of git archive, the command used to produce release tarballs and zip files.

export-ignore

Files and directories marked export-ignore are excluded from archives produced by git archive. This is the standard way to prevent development tooling, CI configuration, test suites, and documentation sources from appearing in a published release artifact:

# Exclude development and CI files from release archives
.github/            export-ignore
.devcontainer/      export-ignore
.gitattributes      export-ignore
.gitignore          export-ignore
.pre-commit-config.yaml  export-ignore
.editorconfig       export-ignore
Makefile            export-ignore
tests/              export-ignore
docs/               export-ignore

The result is an archive that contains only the files a consumer of the release actually needs, without any of the repository scaffolding.

Tip

Always verify the contents of a release archive before publishing it. Run git archive HEAD --output=/tmp/release.tar.gz && tar -tzf /tmp/release.tar.gz and review the file list to confirm that no internal tooling or sensitive configuration was included.

export-subst

The export-subst attribute tells Git to perform keyword substitution when including a file in an archive. Git replaces $Format:<format-string>$ placeholders with values from the commit the archive is built from:

# Substitute version metadata into the package info file
package_info.py     export-subst

Inside package_info.py:

__version__ = "$Format:%H$"
__date__    = "$Format:%ci$"

After git archive, those placeholders are replaced with the full commit hash and the author date of the HEAD commit. This is a lightweight way to embed version information without a separate build step.

GitHub Linguist Overrides

GitHub uses the Linguist library to detect programming languages and generate the repository language statistics shown in the sidebar. Linguist's automatic detection is accurate for most files, but it makes assumptions that sometimes need adjustment, particularly for repositories that contain large amounts of generated code, vendored dependencies, or documentation.

The linguist-* family of attributes lets you override Linguist's behavior directly from .gitattributes, without modifying any Linguist configuration files.

linguist-detectable

By default, Linguist only counts languages in its detection list. Setting linguist-detectable forces Linguist to include a file type it would otherwise skip:

*.tf        linguist-detectable=true

Setting it to false hides files from language detection entirely:

docs/**     linguist-detectable=false

linguist-documentation

Files marked as documentation are excluded from language statistics:

docs/           linguist-documentation

linguist-generated

Files marked as generated are collapsed in diffs and excluded from language statistics. This is the right setting for lock files, compiled outputs, and other machine-generated files that clutter pull request diffs:

# Lock files
package-lock.json   linguist-generated=true
yarn.lock           linguist-generated=true
Cargo.lock          linguist-generated=true
poetry.lock         linguist-generated=true
go.sum              linguist-generated=true

# Generated source files
*.pb.go             linguist-generated=true
*.pb.ts             linguist-generated=true

linguist-language

If Linguist misidentifies a file's language, you can correct it:

# HCL files in this repo are Terraform, not generic HCL
*.tf        linguist-language=Terraform

linguist-vendored

Vendored dependencies are excluded from language statistics and diff views:

vendor/         linguist-vendored=true
third_party/    linguist-vendored=true

A Starter .gitattributes

The following is a practical starting point for most repositories. Adjust the text and binary sections to match the languages and file types your project actually uses:

# .gitattributes
#
# Normalize line endings and classify file types for consistent behavior
# across platforms and tools.

# ----- Default -----
# Detect text files automatically and normalize line endings in the index.
# Specific text types below override the checkout line ending to LF.
*               text=auto

# ----- Source Code -----
*.sh            text eol=lf
*.bash          text eol=lf
*.zsh           text eol=lf
*.py            text eol=lf
*.go            text eol=lf
*.rb            text eol=lf
*.java          text eol=lf
*.c             text eol=lf
*.cpp           text eol=lf
*.h             text eol=lf
*.cs            text eol=lf
*.js            text eol=lf
*.ts            text eol=lf
*.jsx           text eol=lf
*.tsx           text eol=lf
*.tf            text eol=lf
*.hcl           text eol=lf
*.groovy        text eol=lf

# ----- Data and Config -----
*.json          text eol=lf
*.yaml          text eol=lf
*.yml           text eol=lf
*.toml          text eol=lf
*.ini           text eol=lf
*.env           text eol=lf
*.xml           text eol=lf
*.csv           text eol=lf
*.sql           text eol=lf
*.proto         text eol=lf

# ----- Markup and Documentation -----
*.md            text eol=lf
*.rst           text eol=lf
*.txt           text eol=lf
*.html          text eol=lf
*.css           text eol=lf
*.scss          text eol=lf

# ----- Windows Scripts (must use CRLF) -----
*.bat           text eol=crlf
*.cmd           text eol=crlf
*.ps1           text eol=crlf

# ----- Binary: Images -----
*.png           binary
*.jpg           binary
*.jpeg          binary
*.gif           binary
*.ico           binary
*.webp          binary
*.svg           binary

# ----- Binary: Fonts -----
*.woff          binary
*.woff2         binary
*.ttf           binary
*.otf           binary
*.eot           binary

# ----- Binary: Archives -----
*.zip           binary
*.tar           binary
*.gz            binary
*.tgz           binary
*.7z            binary

# ----- Binary: Executables and Libraries -----
*.a             binary
*.o             binary
*.so            binary
*.dylib         binary
*.dll           binary
*.exe           binary

# ----- Binary: Documents -----
*.pdf           binary
*.doc           binary
*.docx          binary
*.xls           binary
*.xlsx          binary
*.ppt           binary
*.pptx          binary

# ----- Binary: Databases -----
*.db            binary
*.sqlite        binary
*.sqlite3       binary

# ----- Export: Exclude from release archives -----
.github/                export-ignore
.devcontainer/          export-ignore
.gitattributes          export-ignore
.gitignore              export-ignore
.editorconfig           export-ignore
.pre-commit-config.yaml export-ignore
Makefile                export-ignore
tests/                  export-ignore

# ----- Linguist: Suppress generated files in diffs and stats -----
package-lock.json       linguist-generated=true
yarn.lock               linguist-generated=true
go.sum                  linguist-generated=true
poetry.lock             linguist-generated=true
Cargo.lock              linguist-generated=true
*.pb.go                 linguist-generated=true

Commit the file and then renormalize if the repository already has inconsistent line endings:

git add .gitattributes
git commit -S -m "chore: add .gitattributes for line ending normalization and file classification"
git add --renormalize .
git commit -S -m "chore: normalize line endings"

References

A .gitattributes file is one of the highest-leverage configuration investments in a repository. One committed file enforces consistent line endings, correct binary handling, clean diffs, and right-sized release archives for every contributor and every CI runner, with no per-machine setup required. Adding it early in a project's life is far less disruptive than retrofitting it after hundreds of commits have locked in inconsistent line endings across a dozen contributors' machines.