Package org.opensextant.util
Class FileUtility
java.lang.Object
org.opensextant.util.FileUtility
- Author:
- ubaldino
-
Field Summary
Modifier and TypeFieldDescriptionstatic final String
static final String
static final String
Char used in config files, dict files.static final String
static final String
static final String
static final String
static final char
Char to use in place of special chars when scrubbing filenames.static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
static final String
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
filenameCleaner
(String fname) Another utility to deal with unicode in filenamesstatic String
generateUniqueFilename
(String F, String Ext) Generate some filename with a unique date/time stampstatic String
generateUniquePath
(String D, String F, String Ext) Generate some path with a unique date/time stampstatic String
getBasename
(String p, String ext) get the base name of a file, given any file extension.static String
getFileDescription
(String url) Get a plain language name of the type of file.static FilenameFilter
getFilenameFilter
(String ext) Simple filterstatic InputStreamReader
getInputStream
(File f, String enc) static InputStreamReader
getInputStream
(String fname, String enc) static InputStreamReader
getInputStreamReader
(File f, String enc) Getting an input stream from a file.static OutputStreamWriter
getOutputStream
(String fname, String enc) Caller is responsible for write flush, close, etc.static OutputStreamWriter
getOutputStream
(String fname, String enc, boolean append) Caller is responsible for write flush, close, etc.static File
static File
getSafeDir
(File dir, String dupeMarker, int maxDups) Get a directory that does not conflict with an existing directory.static File
getSafeFile
(File f, String dupeMarker, int maxDups) static String
getValidFilename
(String path) On occasion file path may contain unicode chars, however as the is encoded, it may not be decodable by OS/FS.static boolean
isArchiveFile
(String filepath) Check if a file is an archivestatic boolean
isArchiveFileType
(String ext) Allow checking of a file extention; NO prefix "."static boolean
Checks file extension of given filepath to see if the format is a known audio type.static boolean
Using Commons getExtension(), determine if the filename represents an image media type.static boolean
isJSONGzip
(String path) Tell if the file is JSON/Gzipstatic boolean
isPlainText
(String filepath) Test is a path or file extension ends with .txt NPE if null is passed in.static boolean
isSpreadsheet
(String filepath) Simple check if a file is typed as a Spreadsheet Tab-delimited .txt files or .dat files may be valid spreadsheets, however this method does not look inside files.static boolean
Checks file extension of given filepath to see if the format is a known video type.static boolean
Check if path or URL is a webpage.static boolean
A way of determining OS Beware, OS X has Darwin in its full OS name.loadDict
(InputStream io, boolean case_sensitive) The do all method.loadDictionary
(File resourcepath, boolean case_sensitive) Load a word list from a file path.loadDictionary
(String resourcepath, boolean case_sensitive) A generic word list loader.loadDictionary
(URL resourcepath, boolean case_sensitive) A generic word list loader.static boolean
makeDirectory
(File testDir) Utility for making dirsstatic boolean
makeDirectory
(String dir) Utility for making dirsprotected static char
normalizeFilenameChar
(char c) Tests for valid filename chars for simple normalization A-Z, a-z, _-, 0-9,static String
static String
Slurps a text file into a string and returns the string.static String
static String
readGzipFile
(String filepath) static boolean
removeDirectory
(File directory) Java oddity - recursive removal of a directorystatic boolean
Write file, UTF-8 is default charset here.static boolean
static boolean
writeGzipFile
(String text, String filepath)
-
Field Details
-
DEFAULT_ENCODING
- See Also:
-
FILENAME_REPLACE_CHAR
public static final char FILENAME_REPLACE_CHARChar to use in place of special chars when scrubbing filenames.- See Also:
-
COMMENT_CHAR
Char used in config files, dict files.- See Also:
-
IMAGE_MIMETYPE
- See Also:
-
DOC_MIMETYPE
- See Also:
-
MESSAGE_MIMETYPE
- See Also:
-
APP_MIMETYPE
- See Also:
-
VID_MIMETYPE
- See Also:
-
AUD_MIMETYPE
- See Also:
-
FOLDER_MIMETYPE
- See Also:
-
FEED_MIMETYPE
- See Also:
-
DATA_MIMETYPE
- See Also:
-
WEBARCHIVE_MIMETYPE
- See Also:
-
WEBPAGE_MIMETYPE
- See Also:
-
SPREADSHEET_MIMETYPE
- See Also:
-
NOT_AVAILABLE
- See Also:
-
GIS_MIMETYPE
- See Also:
-
-
Method Details
-
writeFile
Write file, UTF-8 is default charset here.- Parameters:
buffer
- text to savefname
- name of file to save- Returns:
- status true if file was written
- Throws:
IOException
- if file had IO errors.
-
writeFile
public static boolean writeFile(String buffer, String fname, String enc, boolean append) throws IOException - Parameters:
buffer
- text to savefname
- name of file to saveenc
- text encodingappend
- if you wish to add to existing file.- Returns:
- status if written
- Throws:
IOException
- if file had IO errors.
-
getOutputStream
public static OutputStreamWriter getOutputStream(String fname, String enc, boolean append) throws IOException Caller is responsible for write flush, close, etc.- Parameters:
fname
- file pathenc
- encodingappend
- true = append data to existing file.- Returns:
- stream writer
- Throws:
IOException
- if stream could not be opened
-
getOutputStream
Caller is responsible for write flush, close, etc.- Parameters:
fname
- file nameenc
- text encoding- Returns:
- stream writer
- Throws:
IOException
- if stream could not be openeed
-
getInputStreamReader
Getting an input stream from a file.- Parameters:
f
- file objectenc
- encoding of text data- Returns:
- reader
- Throws:
IOException
- if failure reading file or using encoding.
-
getInputStream
- Throws:
IOException
-
getInputStream
- Throws:
IOException
-
isSpreadsheet
Simple check if a file is typed as a Spreadsheet Tab-delimited .txt files or .dat files may be valid spreadsheets, however this method does not look inside files.- Parameters:
filepath
- path to file- Returns:
- true if file represents one of the various spreadsheet file formats
-
isImage
Using Commons getExtension(), determine if the filename represents an image media type.- Parameters:
filepath
- path to file- Returns:
- if file represents any type of image
-
isVideo
Checks file extension of given filepath to see if the format is a known video type.- Parameters:
filepath
- file name or path- Returns:
- true if file is likely an video file format.
-
isAudio
Checks file extension of given filepath to see if the format is a known audio type.- Parameters:
filepath
- file name or path- Returns:
- true if file is likely an audio file format.
-
isArchiveFile
Check if a file is an archive- Parameters:
filepath
- path to file- Returns:
- boolean true if file ends with .zip, .tar, .tgz, .gz (includes .tar.gz)
-
isArchiveFileType
Allow checking of a file extention; NO prefix "."- Parameters:
ext
- extension to test- Returns:
- boolean true if file ends with .zip, .tar, .tgz, .gz (includes .tar.gz)
-
isPlainText
Test is a path or file extension ends with .txt NPE if null is passed in.- Parameters:
filepath
- path or extension, including "."- Returns:
- true if is .txt or .TXT
-
readFile
- Parameters:
filepath
- path to file- Returns:
- buffer from file
- Throws:
IOException
- on error
-
readFile
- Parameters:
filepath
- path to file- Returns:
- buffer from file
- Throws:
IOException
- on error
-
readFile
Slurps a text file into a string and returns the string.- Parameters:
fileinput
- file objectenc
- text encoding- Returns:
- buffer from file
- Throws:
IOException
- on error
-
readGzipFile
- Parameters:
filepath
- path to file- Returns:
- text buffer, UTF-8 decoded
- Throws:
IOException
- on error
-
writeGzipFile
- Parameters:
text
- buffer to writefilepath
- path to file- Returns:
- status true if file was written
- Throws:
IOException
- on error
-
makeDirectory
Utility for making dirs- Parameters:
testDir
- dir to test- Returns:
- if directory was created or if it already exists
- Throws:
IOException
- if testDir was not created
-
makeDirectory
Utility for making dirs- Parameters:
dir
- dirPath- Returns:
- if directory was created or if it already exists
- Throws:
IOException
- if testDir was not created
-
removeDirectory
Java oddity - recursive removal of a directory- Parameters:
directory
- dir to remove- Returns:
- if all contents and dir itself was removed.
-
generateUniquePath
Generate some path with a unique date/time stamp- Parameters:
D
- directoryF
- filenameExt
- file extension- Returns:
- unique path
-
generateUniqueFilename
Generate some filename with a unique date/time stamp- Parameters:
F
- filenameExt
- file extension- Returns:
- unique filename
-
getParent
- Parameters:
f
- the file in question.- Returns:
- the parent File of a given file.
-
getFilenameFilter
Simple filter- Parameters:
ext
- the extension to filter on- Returns:
- filename filter
-
getBasename
get the base name of a file, given any file extension. This will find the right-most instance of a file extension and return the left hand side of that as the file basename. commons io FilenameUtils says nothing about arbitrarily long file extensions, e.g., file.a.b.c.txt split into ("file" + "a.b.c.txt")- Parameters:
p
- pathext
- extension- Returns:
- basename of path, less the extension
-
getValidFilename
On occasion file path may contain unicode chars, however as the is encoded, it may not be decodable by OS/FS.- Parameters:
path
- path to normalize- Returns:
- filename
-
filenameCleaner
Another utility to deal with unicode in filenames- Parameters:
fname
- name to clean- Returns:
- cleaner filenname
-
getSafeDir
Get a directory that does not conflict with an existing directory. Returns null if that is not possible within the maxDups.- Parameters:
dir
- directorydupeMarker
- incrementormaxDups
- max incrementor- Returns:
- file object
-
getSafeFile
- Parameters:
f
- file objdupeMarker
- incrementormaxDups
- max incrementor- Returns:
- new file
-
normalizeFilenameChar
protected static char normalizeFilenameChar(char c) Tests for valid filename chars for simple normalization A-Z, a-z, _-, 0-9,- Parameters:
c
- character to allow- Returns:
- given character or replacement char
-
isWindowsSystem
public static boolean isWindowsSystem()A way of determining OS Beware, OS X has Darwin in its full OS name.- Returns:
- if OS is windows-based
-
loadDictionary
public static Set<String> loadDictionary(String resourcepath, boolean case_sensitive) throws IOException A generic word list loader.- Parameters:
resourcepath
- classpath location of a resourcecase_sensitive
- if terms are loaded with case preserved or not.- Returns:
- Set containing unique words found in resourcepath
- Throws:
IOException
- on error, resource does not exist
-
loadDictionary
public static Set<String> loadDictionary(URL resourcepath, boolean case_sensitive) throws IOException A generic word list loader.- Parameters:
resourcepath
- classpath location of a resourcecase_sensitive
- if terms are loaded with case preserved or not.- Returns:
- Set containing unique words found in resourcepath
- Throws:
IOException
- on error, resource does not exist
-
loadDict
The do all method. Load the dictionary from stream This closes the stream when done.- Parameters:
io
- streamcase_sensitive
- true if data should be loaded preserving case- Returns:
- set of phrases from file.
- Throws:
IOException
- on IO error
-
loadDictionary
public static Set<String> loadDictionary(File resourcepath, boolean case_sensitive) throws IOException Load a word list from a file path.- Parameters:
resourcepath
- File object to loadcase_sensitive
- if dictionary is loaded with case or not.- Returns:
- a Set object containing distinct dictionary terms
- Throws:
IOException
- if load fails
-
getFileDescription
Get a plain language name of the type of file. E.g., document, image, spreadsheet, web page. Rather than the MIME type technical descriptor.- Parameters:
url
- item to describe- Returns:
- plain language description of the URL
-
isWebURL
Check if path or URL is a webpage. This is helpful for looking at found URLs in unstructured data.- Parameters:
link
- a URL- Returns:
- true if link looks like a URL (ie., if it starts with http: or https:)
-
isJSONGzip
Tell if the file is JSON/Gzip- Parameters:
path
- input file path- Returns:
- true if is file ends with json.gz or contains json and ends with .gz
-