Controlling Git Repository Behavior with .gitattributes¶
A .gitattributes file lets you assign attributes to file paths in a Git repository, giving you explicit control over how Git handles those files during checkouts, merges, diffs, and archives. Line ending inconsistencies, corrupted binary diffs, merge conflicts in generated files, and bloated archive packages are all problems that a well-constructed .gitattributes file prevents before they reach your team.
What Is a .gitattributes File?¶
A .gitattributes file is a plain-text configuration file that maps path patterns to one or more attributes. Git reads it during operations that touch the working tree or the object database, and it uses the attributes to decide how to process each matched file. The same repository can carry multiple .gitattributes files at different directory levels, and a global file can apply across all repositories on a machine.
Locations¶
Git checks for .gitattributes files in four places, from highest to lowest priority:
| Location | Scope | Notes |
|---|---|---|
.gitattributes (repository root or any subdirectory) | Repository | Committed to the repo; applies to all contributors |
.git/info/attributes | Repository | Local only; never committed |
core.attributesFile (global config) | User | Applies to all repositories on the machine |
$(prefix)/etc/gitattributes (system config) | System | Applies to all users on the machine |
Rules in a subdirectory .gitattributes file apply only to paths within that subdirectory and its descendants. Rules in the repository root apply to everything unless overridden by a more specific file lower in the tree.
Commit the file at the repository root
The .gitattributes file at the repository root is the most important one for team consistency. Because it is committed to the repository, every contributor who clones or fetches gets the same behavior automatically, without any local configuration step.
Syntax¶
Each non-blank, non-comment line in a .gitattributes file has the form:
The pattern follows the same glob rules used in .gitignore: * matches any sequence of characters within a path component, ** matches across path separators, and a leading / anchors the pattern to the directory containing the .gitattributes file. The last matching pattern wins for each attribute, so more specific rules placed later in the file take precedence over broader ones.
An attribute can be set in one of four states:
| Form | Meaning |
|---|---|
attr | Set (true) |
-attr | Unset (false) |
attr=value | Set to a specific value |
!attr | Unspecified (use the built-in default) |
Line Ending Normalization¶
Line ending inconsistency is one of the most common sources of noise in cross-platform repositories. Windows tools default to CRLF (\r\n); macOS and Linux use LF (\n). When contributors on different platforms commit files without normalization, a single changed line can produce a diff that touches every line in the file. On top of the diff noise, CRLF in shell scripts, makefiles, and configuration files frequently causes runtime failures on Linux-based CI runners and containers.
The text attribute and its related settings give you explicit control over how Git stores and checks out line endings.
The text Attribute¶
Setting text on a path tells Git to treat the file as text and normalize line endings when committing and checking out. The exact behavior depends on the eol sub-attribute and the core.eol and core.autocrlf git config values.
| Attribute | Effect |
|---|---|
text | Normalize on commit and convert on checkout based on platform and config |
text=auto | Let Git decide whether the file is text; normalize if it detects text |
-text | Disable all line ending conversion for this path |
!text | Restore the built-in default for this path |
binary | Shorthand for -text -diff |
Recommended Setup¶
The most practical approach for most repositories is a two-line setup in .gitattributes:
# Normalize all text files to LF in the repository
* text=auto
*.sh text eol=lf
*.bash text eol=lf
*.py text eol=lf
*.go text eol=lf
*.rb text eol=lf
*.yaml text eol=lf
*.yml text eol=lf
*.json text eol=lf
*.md text eol=lf
*.txt text eol=lf
*.html text eol=lf
*.css text eol=lf
*.js text eol=lf
*.ts text eol=lf
# Files that must use CRLF on checkout (Windows-specific)
*.bat text eol=crlf
*.cmd text eol=crlf
*.ps1 text eol=crlf
The text=auto fallback on * handles any file type not listed explicitly. Git inspects the content and normalizes line endings only for files it detects as text. Binary files are left untouched.
The eol=lf overrides on source files ensure that on checkout, regardless of core.eol or core.autocrlf settings on the local machine, the working-tree copy always has LF line endings. This is critical for scripts and files that run inside Linux containers or CI runners.
Renormalizing an Existing Repository¶
If you add .gitattributes to a repository that already has inconsistent line endings, the existing committed files are not automatically corrected. To renormalize, run:
git add --renormalize re-stages every tracked file through the .gitattributes rules, replacing whatever is currently in the index with the normalized version. The resulting commit corrects the history for all future clones without rewriting any existing commits.
Coordinate the renormalization commit with your team
A renormalization commit touches every text file in the repository. Any in-flight branches will have merge conflicts against it. Announce the change, merge it when the branch is quiet, and ask contributors to rebase their branches against it before continuing.
Binary File Classification¶
Git uses a heuristic to decide whether a file is binary or text: it looks at the first few kilobytes of content for null bytes and high-ratio non-printable characters. That heuristic is usually right, but it fails for some formats, and the failures cause real problems:
- Treating a compiled binary as text produces garbled diff output.
- Treating a binary file as text can corrupt it during a merge or checkout on Windows if
core.autocrlfis active. - Some text-like binary formats (such as SQLite databases) look like text to the heuristic and get incorrectly normalized.
The binary attribute is shorthand for -text -diff. Setting it on a path tells Git to skip line ending conversion and skip attempting to generate a textual diff for that file:
# Images
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.ico binary
*.svg binary
*.webp binary
# Compiled objects and archives
*.a binary
*.o binary
*.so binary
*.dylib binary
*.dll binary
*.exe binary
*.zip binary
*.tar binary
*.gz binary
*.tgz binary
*.7z binary
# Office documents
*.pdf binary
*.doc binary
*.docx binary
*.xls binary
*.xlsx binary
*.ppt binary
*.pptx binary
# SQLite databases
*.db binary
*.sqlite binary
*.sqlite3 binary
# Fonts
*.woff binary
*.woff2 binary
*.ttf binary
*.otf binary
*.eot binary
Annotating binary files explicitly removes any ambiguity and makes the repository behave correctly regardless of the platform or Git client in use.
Merge Strategies¶
The merge attribute controls which merge strategy Git uses when a three-way merge encounters conflicts in a matched file. The default strategy tries to resolve conflicts line by line and marks unresolvable sections with conflict markers. For certain file types, this default produces worse results than an alternative strategy.
merge=union¶
The union strategy resolves conflicts by including all changed lines from both sides, rather than marking the conflict and stopping. This is useful for files where every line is an independent record and the order does not matter much:
Note
The union strategy can produce logically incorrect results if lines from both sides depend on each other. Use it only for files where the content is genuinely additive and order-independent.
merge=ours¶
Registering a driver named ours is not a built-in; you configure it in git config and then reference it by name in .gitattributes. The common use case is generated files that should never be merged: the repository version always wins and the incoming version is discarded.
Configure the driver once:
Then reference it in .gitattributes:
Custom Merge Drivers¶
For files with structure that Git's default merge does not understand well, you can register a custom driver. A custom driver is an external program that receives the three input files (ancestor, ours, theirs) and writes the merged result.
Register the driver in ~/.gitconfig or .git/config:
[merge "npm-lockfile"]
name = npm lock file merge driver
driver = npx npm-merge-driver merge %O %A %B %P
recursive = binary
Then assign it in .gitattributes:
This approach is most useful for lock files (npm, yarn, Cargo, poetry) where naive three-way merges produce files that satisfy the merge tool but fail validation at install time.
Diff Attributes¶
The diff attribute controls how Git generates output for git diff, git log -p, and patch generation. By default, diff is enabled for all text files and disabled for binary files. You can override this behavior in either direction and attach custom diff drivers for specific formats.
Disabling Diffs¶
Setting -diff suppresses textual diff output for a path. This is already implied by binary, but you can apply it independently to a text file when the raw diff output is too noisy to be useful:
Custom Diff Drivers¶
A custom diff driver is an external command Git calls to produce the diff output for a matched file. Custom drivers are most useful for binary formats that have a readable textual representation. For example, you can configure a driver that extracts readable text from a PDF or Office document for diff purposes:
Register the driver in ~/.gitconfig:
When you run git diff or git log -p on a .pdf file, Git calls pdftotext -layout on each version and diffs the text output. The binary file itself is never modified; the conversion is only for display.
A similar pattern works for SQLite databases, compiled protobuf files, and other formats where an external tool can produce a meaningful textual representation:
Export Attributes¶
The export-ignore and export-subst attributes control the output of git archive, the command used to produce release tarballs and zip files.
export-ignore¶
Files and directories marked export-ignore are excluded from archives produced by git archive. This is the standard way to prevent development tooling, CI configuration, test suites, and documentation sources from appearing in a published release artifact:
# Exclude development and CI files from release archives
.github/ export-ignore
.devcontainer/ export-ignore
.gitattributes export-ignore
.gitignore export-ignore
.pre-commit-config.yaml export-ignore
.editorconfig export-ignore
Makefile export-ignore
tests/ export-ignore
docs/ export-ignore
The result is an archive that contains only the files a consumer of the release actually needs, without any of the repository scaffolding.
Tip
Always verify the contents of a release archive before publishing it. Run git archive HEAD --output=/tmp/release.tar.gz && tar -tzf /tmp/release.tar.gz and review the file list to confirm that no internal tooling or sensitive configuration was included.
export-subst¶
The export-subst attribute tells Git to perform keyword substitution when including a file in an archive. Git replaces $Format:<format-string>$ placeholders with values from the commit the archive is built from:
Inside package_info.py:
After git archive, those placeholders are replaced with the full commit hash and the author date of the HEAD commit. This is a lightweight way to embed version information without a separate build step.
GitHub Linguist Overrides¶
GitHub uses the Linguist library to detect programming languages and generate the repository language statistics shown in the sidebar. Linguist's automatic detection is accurate for most files, but it makes assumptions that sometimes need adjustment, particularly for repositories that contain large amounts of generated code, vendored dependencies, or documentation.
The linguist-* family of attributes lets you override Linguist's behavior directly from .gitattributes, without modifying any Linguist configuration files.
linguist-detectable¶
By default, Linguist only counts languages in its detection list. Setting linguist-detectable forces Linguist to include a file type it would otherwise skip:
Setting it to false hides files from language detection entirely:
linguist-documentation¶
Files marked as documentation are excluded from language statistics:
linguist-generated¶
Files marked as generated are collapsed in diffs and excluded from language statistics. This is the right setting for lock files, compiled outputs, and other machine-generated files that clutter pull request diffs:
# Lock files
package-lock.json linguist-generated=true
yarn.lock linguist-generated=true
Cargo.lock linguist-generated=true
poetry.lock linguist-generated=true
go.sum linguist-generated=true
# Generated source files
*.pb.go linguist-generated=true
*.pb.ts linguist-generated=true
linguist-language¶
If Linguist misidentifies a file's language, you can correct it:
linguist-vendored¶
Vendored dependencies are excluded from language statistics and diff views:
A Starter .gitattributes¶
The following is a practical starting point for most repositories. Adjust the text and binary sections to match the languages and file types your project actually uses:
# .gitattributes
#
# Normalize line endings and classify file types for consistent behavior
# across platforms and tools.
# ----- Default -----
# Detect text files automatically and normalize line endings in the index.
# Specific text types below override the checkout line ending to LF.
* text=auto
# ----- Source Code -----
*.sh text eol=lf
*.bash text eol=lf
*.zsh text eol=lf
*.py text eol=lf
*.go text eol=lf
*.rb text eol=lf
*.java text eol=lf
*.c text eol=lf
*.cpp text eol=lf
*.h text eol=lf
*.cs text eol=lf
*.js text eol=lf
*.ts text eol=lf
*.jsx text eol=lf
*.tsx text eol=lf
*.tf text eol=lf
*.hcl text eol=lf
*.groovy text eol=lf
# ----- Data and Config -----
*.json text eol=lf
*.yaml text eol=lf
*.yml text eol=lf
*.toml text eol=lf
*.ini text eol=lf
*.env text eol=lf
*.xml text eol=lf
*.csv text eol=lf
*.sql text eol=lf
*.proto text eol=lf
# ----- Markup and Documentation -----
*.md text eol=lf
*.rst text eol=lf
*.txt text eol=lf
*.html text eol=lf
*.css text eol=lf
*.scss text eol=lf
# ----- Windows Scripts (must use CRLF) -----
*.bat text eol=crlf
*.cmd text eol=crlf
*.ps1 text eol=crlf
# ----- Binary: Images -----
*.png binary
*.jpg binary
*.jpeg binary
*.gif binary
*.ico binary
*.webp binary
*.svg binary
# ----- Binary: Fonts -----
*.woff binary
*.woff2 binary
*.ttf binary
*.otf binary
*.eot binary
# ----- Binary: Archives -----
*.zip binary
*.tar binary
*.gz binary
*.tgz binary
*.7z binary
# ----- Binary: Executables and Libraries -----
*.a binary
*.o binary
*.so binary
*.dylib binary
*.dll binary
*.exe binary
# ----- Binary: Documents -----
*.pdf binary
*.doc binary
*.docx binary
*.xls binary
*.xlsx binary
*.ppt binary
*.pptx binary
# ----- Binary: Databases -----
*.db binary
*.sqlite binary
*.sqlite3 binary
# ----- Export: Exclude from release archives -----
.github/ export-ignore
.devcontainer/ export-ignore
.gitattributes export-ignore
.gitignore export-ignore
.editorconfig export-ignore
.pre-commit-config.yaml export-ignore
Makefile export-ignore
tests/ export-ignore
# ----- Linguist: Suppress generated files in diffs and stats -----
package-lock.json linguist-generated=true
yarn.lock linguist-generated=true
go.sum linguist-generated=true
poetry.lock linguist-generated=true
Cargo.lock linguist-generated=true
*.pb.go linguist-generated=true
Commit the file and then renormalize if the repository already has inconsistent line endings:
git add .gitattributes
git commit -S -m "chore: add .gitattributes for line ending normalization and file classification"
git add --renormalize .
git commit -S -m "chore: normalize line endings"
References¶
A .gitattributes file is one of the highest-leverage configuration investments in a repository. One committed file enforces consistent line endings, correct binary handling, clean diffs, and right-sized release archives for every contributor and every CI runner, with no per-machine setup required. Adding it early in a project's life is far less disruptive than retrofitting it after hundreds of commits have locked in inconsistent line endings across a dozen contributors' machines.