File Formats

File formats can affect long-term preservation and reuse. While researchers may use proprietary file formats for analysis, converting data to open and/or standard formats will help ensure the data can be rendered and accessed in the future. Researchers can also choose to make data available in both preservation-friendly formats and original file formats.

Best practice suggests selecting formats that are open/documented standards, non-proprietary, unencrypted, uncompressed, and commonly used by your research community. For example, when you have spreadsheet-based (aka tabular) data save the file as Comma-separated values (.csv) instead of Excel (.xls, .xlsx) and for text files use Plain text (.txt) or PDF/A (.pdf) instead of Microsoft Word (.doc, .docx).

Repositories may provide a list of preferred files formats. The Library of Congress and NARA also provides information on recommended file formats. Here is a list of the typical file formats we recommend using:

  • Audio
    • WAV
    • MPEG-3 (MP3)*
  • Computer Aided Design (CAD)
    • PDF*
    • PRC
    • U3D
  • Database
    • MDB
    • SQLite
    • XML
  • Image
    • BMP*
    • JPEG*
    • PNG
    • TIF
    • TIFF
  • Tabular
    • CVS*
    • ODS
    • TXT*
    • XLSX
  • Text (Formatted)
    • DOCX
    • PDF*
    • ODT
    • RTF
  • Text (Plain)
    • TXT
  • Video
    • AVI
    • MOV
    • MPEG-4 (MP4)*

 

*preferred document format.